Omega  Alpha 

edited  by 
D.  TsLchritzis 

Technical  Report  CSRG-127 
March,  1931 


COMPUTER  SYSTEMS  RESEARCH  GROUP 

UNIVERSITY  OF  TORONTO 


Omega  Alpha 

edited  by 
D.  Tsiehritzis 


Technical  Report  CSRG-127 
March,  1931 


Computer  Systems  Research  Group 
University  of  Toronto 
Toronto,  Ontario 
Canada,  M5S  lAl 


The  Computer  Systems  Research  Group  (CSRG)  is  an  interdisciplinary  group 
formed  to  conduct  research  and  development  relevant  to  computer  systems  and 
their  application.  It  is  jointly  administered  by  the  Department  of  Electrical  En¬ 
gineering  and  the  Department  of  Computer  Science  of  the  University  of  Toronto, 
and  is  supported  in  part  by  the  Natiiral  Sciences  and  Engineering  Council  of  Ca¬ 
nada. 


C  1981,  Computer  Systems  Research  Group,  University  of  Toronto. 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Toronto 


https://archive.org/details/technicalreportc127univ 


-  2  - 


This  report  is  yet  another  collection  of  papers  from  the  database  group  at  the 
University  of  Toronto.  All  the  papers  are  related  to  Office  Information  Systems 
(013).  Yfhile  we  still  carry  on  significant  research  in  database  systems,  this  is 
not  reported  in  this  collection  of  papers. 

One  of  the  rea-sons  that  the  OIS  area  is  so  interesting  is  the  convergence  of 
different  areas  and  approaches  into  a  set  of  common  problems.  Artificial  intelli¬ 
gence,  Graphics,  interactive  interfaces,  data  base  management,  local  and  global 
networks,  word  processing  and,  organizational  behavior  arc  examples  of 
research  areas  which  are  applicable  to  OIS.  This  convergence  brings  different 
perspectives,  different  techniques  arid  many  arguments.  In  fact,  all  these  areas, 
and  more,  are  needed  to  investigate  problems  in  OIS. 

Our  approach  to  OIS  is  heavily  influenced  by  database  techniques.  We  try  to 
put  structure  and  take  advantage  of  structure  in  data  to  a  fault.  Afterall,  we 
were  and  remain  database  researchers.  We  are.  however,  slowly  being 
influenced  by  some  of  the  great  ideas  present  in  other  areas.  In  fact,  our 
involvement  in  OIS  has  been  a  great  educational  experience. 

The  work  presented  in  these  papers  is  neither  complete  nor  exhaustive.  It 
just  gives  a  snapshot  of  where  we  are  and  where  we  are  going.  Some  of  our 
thoughts  are  tentative.  We  communicate  them  anyway  in  the  hope  of  stimulat¬ 
ing  an  exchange  of  ideas.  There  are  many  persons  getting  interested  in  013. 
Maybe  this  report  can  provide  them  with  some  research  directions  and  exam¬ 
ples  of  what  can  be  done. 

OIS  is  yet  another  area  where  the  practitioners  are  way  ahead  of  the  think¬ 
ers.  There  are  systems  being  built,  projects  being  initiated  and  companies  being 
formed  to  respond  to  the  users  pressing  needs.  In  the  meantime,  there  is  very 
little  research  and  very  few  people  actively  expanding  the  horizons.  We  have  a 
lot  of  catching  up  to  do.  Academics  used  to  propose  the  blue  sky  ideas  which 


the  companies  •were  hesitant  in  implementing.  Wc  arc  getting  to  the  point  ■Rrhcrc 
it  is  a  struggle  to  keep  up  'svith  -what's  available  in  the  market  place.  Sometimes 
faculty  and  students  in  universities  cannot  even  understand  the  advertisements 
in  Datamation.  Under  these  curcumstances  it  is  debatable  whether  we  are  lead¬ 
ing  the  computer  science  activity.  Tt  is  even  debatable  whether'  we  are  even  fol¬ 
lowing  it. 


Editor 
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Office  Information  Systems:  Challenge  for  the  80’s* 


D.C,  Tsichritzis 
F.K.  Lochovsky 


Computer  Systems  Research  Group 
University  of  Toronto 
Toronto,  Canada 
MSS  lAi 


ABSTRACT 

Today’s  office  is  plagued  by  problems  of  rising  costs  and  low  productivity.  Office 
automation  and  office  information  systems  are  proposed  as  a  possible  solution  to  many 
of  the  information  handling  problems  of  the  office.  The  technology  is  available  and  the 
market  place  is  ready  for  office  automation.  However,  several  challenges  need  to  be 
met  before  the  proposed  remedy  can  be  applied  effectively.  Automated  solutions  to 
the  different  aspects  of  the  office  information  handling  problem  need  to  be  integrated. 
Models  and  techniques  need  to  be  developed  to  represent  and  analyze  information  flow 
in  an  office.  Interfaces  need  to  be  developed  that  are  easy  to  use  and  integrate  many 
different  capabilities.  Finally,  one  must  examine  the  impact  on  people  of  office 
automation  and  produce  solutions  that  are  acceptable  to  the  end-users.  To  attack 
these  problems,  human  factors,  software  and  hardware  engineering  techniques  have  to 
be  brought  to  bear. 

Keywords  and  Phrases;  office  automation,  office  information  systems,  data  base 
mariageiiiexit,  data  couiiuuaicatioiis,  office  modelling,  human  factors,  software 
engineering 

CR  Categories:  2.11,  3.3,  3.5,  3.7,  3.81,  4.2,  4.33,  6.35,  8.1 
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Office  Information  Systems 


1  INTRODUCTION 

The  technological  revolution  of  hardware,  that  began  with  the  introduction  of  the 
microprocessor,  is  now  beginning  to  impact  areas  of  everyday  life.  The  reduction  in 
price  of  microprocessors  and  their  packaging  and  proliferation  is  generating  many  new 
applications  for  computers.  Perhaps  the  biggest  potential  use  of  microcomputers  and 
their  most  important  impact  will  be  in  the  office.  There  are  many  factors,  both 
technical  and  economic  that  point  to  the  need  for  office  automation  and  office 
information  systems. 

From  a  technical  standpoint,  the  reduction  in  cost  of  computer  processing  power 
is  a  very  strong  driving  force  for  office  automation.  As  a  result,  office  equipment  is 
becoming  more  inteliigenl  and  mure  integrated.  As  an  example,  word  processing 
integrates  text  preparation,  text  editing  and  text  production  using  computerized 
equipment.  It  has  now  become  feasible  for  even  a  small  office  to  install  a  computer 
system  to  be  used  for  routine  office  work  as  opposed  to  traditional  data  processing. 

Economically,  it  makes  sense  to  invest  more  in  a  cheap  resource  in  order  to 
better  utilize  an  expensive  resource.  When  the  cost  of  labor  is  low  and  that  of  hardware 
is  high,  it  is  cheaper  to  hire  more  people  than  to  invest  in  expensive  office  equipment, 
in  the  past  the  capitalization  per  office  worker  has  been  very  low  [Business  Week,  1975; 
Zisman,  1977].  Today,  the  cost  of  people  is  steadily  increasing  and  the  cost  of 
computerized  office  equipment  is  falling.  In  addition,  the  information  handling 
capabilities  of  the  equipment  is  becoming  more  sophisticated.  It  thus  makes  sense  to 
invest  a  little  more  in  the  equipment  in  an  attempt  to  better  utilize  the  available 
people. 

Today,  large  organizations  are  facing  limits  to  growth  because  they  cannot 
adequately  control  their  operations.  Good  and  timely  information  for  decision  making 
is  becoming  harder  and  more  expensive  to  obtain.  The  process  of  acquiring,  collating, 
storing  and  disseminating  information  is  centered  mainly  in  the  office.  For  the 
situation  to  improve,  the  nature  of  information  handling  in  the  organization  in  general 
and  in  the  office  in  particular  must  change. 

It  is  not  possible  or  realistic  to  think  we  can  go  back  to  small,  informally  run 
organizations.  On  the  contrary,  governments,  as  a  primary  example,  are  faced  with  an 
increasing  demand  for  services.  At  the  same  time,  they  are  trying  to  reduce  costs  and 
limit  increases  in  spending.  A  prime  area  for  improvement  is  office  worker 
productivity  which  has  traditionally  been  low  [Business  Week,  1975].  Thus,  if  costs  are 
to  be  reduced  or  held  in  check  and  sex’vice  is  also  to  be  maintained  or  increased, 
productivity  must  increase. 

All  of  these  factors  point  to  the  need  for  automation  of  the  information  handling 
and  information  processing  functions  of  the  office.  The  market  place  is  ready  for  such 
a  breakthrough.  The  raw  technology  is  almost  here  for  the  wholesale  introduction  of 
computers  in  offices.  Word  processing,  electronic  mail  and  electronic  files  are  not 
futuristic  ideas.  These  are  techniques  which  have  been  used  in  the  past.  Their  use  was 
restricted,  however,  to  sophisticated  environments  in  which  people  were  knowledgeable 
about  computers  and  had  access  to  sophisticated  and  expensive  computer  equipment. 


-7- 


OfRce  Inforrnalion  Systems 


Today,  the  equipment  can  be  obtained  at  affordable  prices.  However,  it  is 
important  to  realize  that  solutions  to  the  office  information  handling  problem  cannot 
be  provided  solely  in  terms  of  technological  innovation.  A  balanced  and  evolutionary 
approach  is  needed  for  the  introduction  and  the  proper  use  of  technology  in  the  office 
on  a  large  scale.  This  will  allow  adequate  time  for  chaaiges  in  attitude  and  development 
of  proper  procedures  in  the  new  environment. 

The  problem  of  office  automation  is  not  a  problem  of  only  data  processing  or 
communications  or  the  right  equipment.  Currently,  solutions  to  each  of  these 
problems  are  provided  by  different  sectors  of  the  market  place.  Office  information 
systems  require  the  combination  of  all  of  these  individual  aspects  of  the  information 
handling  problems  of  the  office  in  an  integrated  solution.  Thus  office  automation 
provides  a  common  battleground  where  many  different  technologies  and  companies 
meet.  Each  participant  in  the  field  has  a  distinct  vantage  point  and  some  unique 
advantages  in  terms  of  development  and  use  of  office  information  systems.  The  area 
provides  unique  opportunities  for  many  organizations  to  participate  as  well  as  many 
challenges  for  them. 


2  MARKET  FORCES  AND  PARTICIPANTS 

To  understand  the  nature  of  and  impetus  for  office  automation,  it  is  necessary  to 
know  what  the  market  forces  are  and  who  the  participants  are  in  the  field.  Office 
automation  requires  the  application  and  integration  of  many  different  technologies. 
Facilities  such  as  communications,  text  preparation,  forms  processing  and  data 
management  need  to  be  provided.  The  technologies  for  these  facilities  already  exist 
and  are  being  meirketed  by  different  companies.  However,  all  these  companies  each 
have  expertise  in  only  a  small  area  of  the  total  field  of  office  information  systems.  In 
order  to  enter  the  office  information  systems  field,  they  will  need  to  broaden  their  base 
of  operation. 

There  are  (so  far)  at  least  three  different  market  sectors  where  companies  are 
poised  to  enter  the  office  information  systems  market.  These  sectors  are 
communications,  office  equipment  and  data  processing.  Each  sector  has  a  tradition,  a 
proven  technology  and  an  existing  market  penetration.  Each  one  has  a  solution  for  a 
distinct  aspect  of  office  automation.  However,  the  users  will  demand  integrated 
solutions  not  piecewise  approaches.  This  implies  that,  to  compete  for  the  office 
information  systems  market,  ail  of  the  companies  have  to  move  away  from  their  proven 
ground  to  new  (for  them)  areas.  Their  past  experience  and  tradition  will  necessarily 
shape  the  kinds  of  solutions  they  provide. 

The  communications  sector  will  natursdly  concentrate  on  electronic  mail.  There 
are  several  different  types  of  communications  companies  that  can  compete  in  this 
area.  The  telephone  companies  appear  to  be  the  obvious  favorite  for  the  market. 
Although  they  are  primarily  in  the  audio  communications  business,  they  moved  fairly 
naturally  into  data  communications.  They  can  easily  expand  into  general  electronic 
mail.  In  addition,  their  local  PBX's  can  be  expanded  to  local  inter-offics 
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communications. 

There  are,  however,  also  other  communications  companies  who  may  compete  in 
the  area.  The  telegram  companies  which,  through  telex,  compete  with  telephone 
companies  are  also  in  the  message  sending,  and  therefore,  electronic  mail  business. 
The  independent  satellite  network  companies  can  start  providing  electronic  mail 
services  in  addition  to  telecommunications  chainnels.  There  are  the  cable  TV 
companies  which  have  already  in-place  communications  channels.  They  do  not  have  to 
pump  only  TV  programs  through  their  channels.  They  can  also  provide  electronic  mail. 
Finally,  the  post  office  in  some  countries  is  realizing  that  it  may  lose  out  on  the  most 
lucrative  part  of  its  business.  Hence,  it  has  to  move  into  electronic  mail  and  data 
communications,  either  alone  or  in  combination  with  other  companies,  e.g.,  telepost. 

Office  equipment  and  supply  companies  are  another  market  sector  that  is  moving 
into  office  automation.  Companies  are  moving  from  supplying  just  office  equipment 
components  to  providing  more  integrated  office  work  stations  that  can  do  word 
processing,  personal  computing,  copying /printing  and  whatever  services  are  now 
provided  by  diverse  equipment  from  different  companies.  Copier  companies  are 
moving  into  office  terminals,  e.g.,  Xerox.  Typewriter  companies  are  moving  into  office 
terminals,  e.g.,  IBM  and  Olivetti.  Office  supply  companies  are  planning  their  own  office 
wnrk  station,  e.g.,  Moore  Industries.  Finally,  the  traditional  electronic  companies  are 
moving  into  office  systems,  e.g.,  Hitachi  and  Phillips.  All  of  these  companies  will 
compete  feverishly  to  capture  the  office  work  station  market. 

The  data  processing  sector  of  the  market  is  perhaps  the  best  placed  to  compete 
in  and  influence  office  automation.  Data  processing  companies  can  already  provide 
electronic  mail  and  word  processing  through  their  centralized  systems.  They  all  are 
moving  into  distributed  architectures  with  the  introduction  of  their  o’.vn  local  and 
global  networks.  In  such  an  environment,  their  terminals  can  be  upgraded  to  do  local 
word  processing.  In  addition,  they  can  provide  electronic  mail  plus  all  the  data 
processing  they  normally  provide.  Rather  than  allowing  inroads  into  their  data 
processing  mai'kets,  they  will  try  to  capture  the  electronic  mail,  word  processing  and 
office  information  systems  markets.  They  already  have  a  marketing  force  selling  to 
large  organizations.  In  addition,  they  have  existing  focal  points  of  activity  in  the 
computer  centers  of  many  organizations.  They  are  thus  in  a  good  position  to  promote 
their  products  in  the  office. 

However,  much  as  they  may  like  to  be.  data  processing  companies  will  not  be 
alone  In  the  oflice  information  systems  market.  Why  should  the  communications 
companies  stop  only  at  providing  electronic  mail?  Or,  why  should  the  office  equipment 
companies  only  provide  office  work  stations?  They  can  all  expand  their  base  of 
operation  to  provide  the  other  services  required  for  office  automation.  For  instance, 
communications  companies  can  also  provide  wmrd  processing,  data  base  management 
and  data  processing.  The  office  equipment  companies  can  introduce  local 
communications,  data  processing  and,  through  interfaces  to  global  satellite  nctv.'-orks, 
a  total  data  processing  and  data  commumentions  emdronment  for  the  cfTicc. 
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Each  of  these  organizations  has  an  established  market  which  they  will  try  to 
retain  while  encroaching  on  everyone  eise’s  area  and  the  new  office  information 
systems  area.  In  order  to  move  into  their  non-expertise  areas  in  office  automation, 
they  will  have  to  acquire  the  appropriate  technologies.  However,  this  should  not  be 
difficult  to  do  and  points  to  the  possibility  of  other  participants  in  the  field.  For 
instance,  there  are  companies  which  either  have  knowledge  of  a  particular  office 
intensive  application,  e.g..  banking,  or  they  have  money  to  diversify  into  a  new  and 
exciting  sector,  e.g,,  oil  companies.  These  organizations  can  and  are  buying  the 
technology  and  are  using  their  market  expertise  and  financial  power  to  enter  the  office 
automation  market. 

The  prospective  users  of  office  information  systems  will  soon  be  in  a  unique 
position  to  be  offered  essentially  competing  services  from  telecommunications,  office 
equipment,  data  processing  and  other  companies.  Each  company  will  position  its 
product  to  take  advantage  of  its  established  market.  For  instance,  telephone 
companies  can  integrate  voice,  message  and  data  communications.  Office  equipment 
companies  can  bundle  copying,  printing,,  word  processing  and  calculating.  Data 
processing  companies  can  try  to  take  advantage  of  all  their  in-place  software  for 
sophisticated  applications.  Although  each  company  has  some  unique  advantages,  none 
is  guaranteed  to  capture  the  office  information  system  market.  All  of  them  are  very 
large  companies  which  means  that  they  will  compete  for  a  while.  Fortunately,  the 
market  is  immense  and,  therefore,  there  is  room  for  many  organizations. 

The  area  of  office  information  systems  offers  the  exciting  environment  of 
different  ideas  and  technologies  meshing  together.  With  all  this  interaction  and  cross¬ 
fertilization,  many  new  ideas  will  come  about.  The  companies  competing  in  the  market 
will  have  to  provide  integrated,  or  at  least  readily  integratable,  systems.  They  will  have 
to  orient  their  systems  to  fit,  at  least  initially,  the  current  structure  of  the  office  and 
provide  flexibility  in  their  systems  to  allow  them  to  evolve  as  the  office  structure 
evolves.  Finally,  they  will  have  to  make  their  s3'steras  readily  acceptable  to  the  users, 
the  office  workers.  Thus,  the  area  of  office  information  systems  is  faced  with  four 
major  challenges  that  need  to  be  met.  The  first  three  are  those  just  discussed  ~ 
integration,  office  design  and  user  requirements.  The  fourth  concerns  the  socio¬ 
economic  problems  that  may  arise  from  the  introduction  of  office  information  systems. 
We  will  discuss  these  four  issues  in  the  order  in  which  they  need  to  be  dealt  with  and 
not  necessarily  according  to  their  importance. 

3  INTEGRATION 

An  office  embodies  many  different  information  concepts,  processing  technologies, 
storage  medium  and  communication  methods.  Information  concepts  include  such 
notions  as  letters,  memos,  business  forms,  reports  and  pictures.  Processing 
technologies  found  in  an  office  encompass  typewriters,  copiers,  tape  recorders 
(dictaphones),  calculators,  adding  machines  and  specialized  computers.  Filing 
cabinets,  book  shelves,  desks,  paper,  magnetic  tape,  magnetic  disk  and  electronic 
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media  constitute  some  of  the  storage  media  in  an  office.  Finally,  such  things  as 
telephones,  postal  mail,  electronic  mail,  face  to  face  conversation,  recorded  messages, 
operations  manuals  and  visual  media  (pictures,  charts,  etc.)  are  used  to  communicate 
and  disseminate  information. 

Most  of  the  technologies  found  in  an  office  address  themselves  to  specific  aspects 
of  the  office  information  handling  problem.  Typewriters  produce  printed  text. 
Photocopiers  produce  duplicates.  Filing  cabinets  store  documents.  Postal  mail 
transmits  and  disseminates  information.  Until  recently,  there  has  been  very  little 
integration  between  these  different  technologies.  The  integration  was  provided  by  the 
human  element  typing  documents,  copying  them,  filing  them  and  mailing  them. 
Lately,  some  integration  has  started  to  occur  in  terms  of  word  processing  systems  and 
electronic  mail.  However,  there  is  a  potential  for  much  more  integration  in  office 
information  systems  and  this  must  occur  if  the  automated  office  is  to  be  realized, 
integration  must  take  place  both  within  the  four  areas  outlined  above  as  well  as 
between  them.  We  outline  first  a  possible  scenario  for  integration  within  each  area  and 
then  discuss  the  integration  of  the  different  areas. 

In  terms  of  information  concepts,  a  unifying  concept  is  needed  for  integrating  the 
different  concepts  now  found  in  an  office.  This  unification  is  needed  to  provide  a 
uniform  environment  for  implementation  and  use  of  office  information  systems. 
Consider  the  concept  of  a  form  as  such  a  unifying  agent  [Tsichritzis,  1980].  Forms  can 
be  viewed,  at  two  extremes,  as  either  text  or  formatted  data.  In  addition,  they  can 
represent  various  combinations  of  text  and  formatted  data.  Thus,  all  of  letters, 
memos,  business  forms,  reports  and  pictures  can  be  captured  by  the  concept  of  a 
form.  In  the  automated  office,  however,  forms  correspond  to  electronic  images  of 
these  objects. 

Treating  all  of  the  information  concepts  as  forms  allows  a  uniform  interface  to  be 
provided  to  the  user.  The  user  does  not  have  to  learn  different  concepts  or  operations 
to  handle  letters,  memos,  reports,  etc.  Since  the  concept  of  a  form  is  also  very 
familiar  to  office  workers,  this  should  help  in  introducing  office  information  systems.  It 
is  possible  to  provide  operations  on  electronic  forms  that  closely  resemble  operations 
on  paper  versions  of  forms.  Thus  it  should  be  possible  to  find,  modify,  copy,  edit  and 
file  electronic  forms  in  the  same  way  that  these  operations  apply  to  manual  forms 
[Tsichritzis,  1979a;  Cheung  and  Kornatowski,  1980].  In  addition,  forms  can  be  used  to 
specify  more  general  office  procedures  [Hogg  and  Nierstrasz,  1980].  Thus,  for  example, 
rin  office  worker  could  use  forms  to  specify  the  operation  of  a  billing  system  in  an 
organization. 

With  the  failing  cost  of  computer  processing  power,  most  if  not  all  of  the  data 
processing  functions  within  an  office  can  be  done  by  computer.  Computers  are  already 
taking  over  the  text  preparation  functions  previously  done  by  typewriters  alone. 
Copying  and  calculating  can  also  be  done  more  conveniently  by  computer.  With  high 
speed,  high  quality  output  devices,  e.g.,  laser  printers,  it  is  more  convenient  to 
produce  multiple  copies  through  the  computer  then  by  photocopying.  With  the 
integration  of  voice  input  to  computers,  voice  storage  and  limited  processing  is  now- 
possible. 


Office  Information  Systems 


Initially,  the  computer  in  the  office  will  probably  play  a  passive  role.  That  is.  all 
actions  taken  need  to  be  initiated  by  the  users.  Eventually,  however,  the  computer  will 
take  a  more  active  role  to  become  the  controller  of  the  information  flow.  It  can  relieve 
the  office  worker  of  many  of  the  trivial  tasks  that  now  must  be  done.  For  example,  it 
can  act  as  a  coordinating  agent,  waiting  for  certain  inputs  to  arrive  or  actions  to 
happen  and  only  notifying  the  user  vfhen  ail  parameters  for  action  arc  available.  It  can 
disseminate  information,  like  memos,  automatically  at  the  users  request  according  to 
prespecified  criteria.  Languages  or  other  techniques  for  specifying  these  "active" 
procedures  need  to  be  developed. 

To  be  useful,  data  in  an  office  information  system  should  be  consistent,  available 
when  required,  shared  by  many  users  and  managed  efficiently  and  effectively.  This 
implies  that  the  data  are  managed  by  a  data  base  management  system  (DBMS)  and 
stored  in  one  or  more  data  bases.  The  data  base(s)  along  with  the  DBMS  provide  the 
integrating  concept  for  data  storage  and  management  in  an  office  information  system 
[Tsichritzis,  1980].  The  fact  that  the  data  are  managed  by  a  DBMS  can  be  made  totally 
transparent  to  the  users  of  the  system  [Cheung  and  Kornatowski,  19B0].  The  use  of  a 
DBMS  also  allows  for  the  possibility  of  more  general  access  to  the  data  base  via  the 
query  facilities  of  the  DBMS,  e.g..  Uli'S/AlKS  [Tsichritzis.  1980]. 

In  order  to  apply  data  management  techniques  successfully  to  the  office 
environment,  some  of  our  data  management  concepts  may  have  to  change  or  at  least 
be  modified.  Traditional  data  bases  and  data  models  are  very  much  dependent  on  the 
concept  of  records  (both  physical  and  logical)  and  how  they  are  stored  and  accessed. 
Query  languages  also  depend  on  the  record  concept  for  searching  purposes.  This  is 
fine  when  the  data  are  formatted.  However,  much  of  the  data  in  an  office  environment 
are  non-tormatted  text  or  pictorial  data.  New  or  modified  data  models  and  query 
languages  will  have  to  be  developed  to  handle  these  types  of  data  [Alio  et  at.,  1978]. 

Network  communication,  via  local  or  global  networks,  provides  the  possibility  of 
simultaneous  communication  with  all  other  users  of  the  network.  Electronic  mail,  a 
form  of  network  communication,  can  replace  local  office  mail  and  possibly  some  inter¬ 
office  communications,  e.g.,  postal  mail,  telephone  calls.  Thus  the  concept  of  network 
communications  can  be  used  as  the  integrating  concept  for  communications  in  the 
automated  office. 

Communication  can  be  written,  verbal  or  pictorial.  There  is  no  reason  why  all  of 
these  cannot  be  handled  in  a  uniform  way  in  the  automated  office.  This  implies  an 
integration  effort  at  both  the  hardware  and  software  level.  Voice  input  needs  to 
interface  directly  with  the  computer  and  be  stored  and  transmitted  in  digital  form. 
Facsimile  and  graphics  also  need  to  be  integrated  wdth  computer  and  network 
communications  technology.  Appropriate  protocols  and  transformations  need  to  be 
developed  before  integration  can  occur. 

There  is  a  tremendous  need  to  apply  softw'are  engineering  as  well  as  hardware 
engineering  techniques  to  office  information  systems.  Within  the  different  areas  of 
office  information  systems  just  outlined,  many  different  concepts  and  technologies 
have  to  be  integrated.  Text  and  formatted  data  have  to  be  integrated  so  the  user  can 
move  from  word  processing  to  data  base  management.  Text  preparation  and  direct 
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typesetting  have  to  be  integrated  for  the  production  of  high  quality  paper  output. 
Message  preparation  and  message  communications  systems  have  to  be  integrated  so 
the  user  can  create,  send  locally  (through  a  local  network:)  or  send  far  (through  a 
global  network).  As  well,  the  different  areas  have  to  be  integrated  -with  each  other  to 
achieve  the  automated  office.  Data  base  management  and  network  communications 
have  to  be  integrated  to  allow  for  distributed  processing  and  inter-  and  intra-office 
communications.  Forms  handling  and  data  base  management  have  to  be  integrated  to 
provide  forms  management  and  an  easy  to  use  interface  to  the  DBMS. 

The  integration  within  and  between  the  areas  cannot  be  an  afterthought.  It 
requires  the  application  of  software  engineering  techniques  to  ensure  their  smooth 
interfacing  and  efficient  operation.  Since  most  of  the  technologies  evolved 
independently,  there  is  a  need  for  some  transformations,  e.g..  between  speech  and 
character  strings.  However,  if  the  transformations  between  interfaces  are  complicated 
and  poorly  designed,  there  will  be  performance  degradation  which  may  not  be 
PACceptable  in  a  real  time  application.  The  problem  of  software  engineering  in  office 
information  systems  is  somewhat  more  complex  than  in  most  software  systems  since 
there  is  a  need  to  integrate  many  and  different  technologies  in  a  common  environment. 
However,  only  if  this  integration  takes  place  can  the  user  reedize  the  full  potential  of 
office  information  systems. 


4  OFFICE  DESIGN 

The  organization  of  most  offices  evolves  over  a  long  period  of  time.  Changes  to 
office  procedures  or  organization  are  usually  introduced  gradually.  This  is  because  it  is 
fairly  difficult,  costly  and  traumatic  to  reorganize  offices  continually.  Even  gradual 
change  can  be  difficult  to  achieve.  If  new  procedures  are  introduced,  then  people  have 
to  be  retrained.  If  new  people  are  hired,  they  have  to  be  trained  in  the  current 
procedures  of  the  office.  Often  these  procedures  are  not  very  well  formalized  and  are 
acquired  more  by  osmosis  than  by  direct  training.  If  new  equipment  is  acquired,  office 
workers  have  to  be  trained  in  its  use.  Meanwhile,  activity  in  the  office  should  continue 
as  usual  without  major  disruptions  to  the  company’s  business  activities. 

For  these  reasons,  one  is  usually  very  wary  of  tampering  with  an  office’s 
organization  on  a  large  scale.  Because  of  this,  the  introduction  of  the  automated  office 
has  to  reflect,  initially  at  least,  current  office  organization  and  procedures  if  it  is  to  be 
successful.  As  a  first  step,  therefore,  we  need  to  capture  the  current  situation  and 
organization  in  an  office.  It  should  be  obvious  that  only  well-defined  portions  of 
conventional  offices  can  be  captured  effectively  for  automation.  Appropriate  models 
and  techniques  need  to  be  developed  to  represent  eind  analyze  those  well-defined 
aspects  of  an  office  [Ellis.  1979:  Tsichritzis,  1979b:  Ladd  and  Tsichritzis,  1980]. 

An  office  orgeinization  usually  reflects  present  and/or  past  goals  of  the  business 
as  well  as  present  and/or  past  personalities  within  the  office.  Because  of  the 
evolutionary  nature  of  offices,  this  office  organization  will  usually  lag  behind  current 
company  objectives  and  priorities.  Thus  it  may  reflect  goals  and  operating  constraints 
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that  are  not  in  existence  now.  These  goals  and  constraints  may  have  changed  in  time 
or  they  may  be  irrelevant  in  an  automated  office.  However,  they  should  be  captured  as 
a  first  step  in  developing  a  model  of  the  office.  In  addition,  if  the  automated  office,  at 
least  in  procedures  and  organization,  functions  in  a  manner  similar  to  the  conventional 
office,  users  should  not  feel  as  threatened  by  its  introduction. 

Because  of  their  resilence,  people  in  a  conventional  office  manage  to  cope  in  the 
face  of  bad  design  and  loosely  organized  situations.  Automated  systems  usually  have 
much  less  tolerance  for  bad  design  and  require  much  tighter  control  over  organization 
and  procedures.  In  an  analogy,  a  small  artisan's  workshop  need  not  be  organized.  An 
assembly  line  on  the  other  hand  needs  expert  organization  to  function  properly.  These 
situations  of  bad  design  and  loose  control  will  soon  manifest  themselves  in  the 
automated  office.  Thus  the  need  for  office  redesign  will  became  inevitable. 

An  automated  office  can  allow'  much  more  flexiblity  in  terms  of  office  design. 
Take,  for  instance,  the  concept  of  a  maximum  and  mimmum  load  to  keep  a  person 
occupied.  In  a  conventional  office  this  is  handled  by  creating  parallel  processing  paths, 
e.g.,  hiring  extra  people  if  the  load  is  too  great,  or  concentrating  activities  in  the  same 
person  if  the  load  is  too  small.  In  an  automated  office,  one  can  introduce  bigger 
machines,  or  more  automation  to  alleviate  bottlenecks.  In  this  way  restructuring  may 
be  avoided  or  it  may  be  forced. 

Tools  and  techniques  have  to  be  established  for  analyzing  information  flow'  and 
processing  capabilities  in  the  automated  office.  These  tools  and  techniques  will  depend 
on  the  particular  properties  of  the  office  that  w'e  would  like  to  analyze.  For  example, 
we  can  look  at  data  flow,  processing  capacity  or  coordination  of  activities  in  the  office. 
Techniques  exist  in  other  areas  of  computer  science  and  engineering  for  analyzing 
these  situations  in  different  contexts.  Research  is  needed  to  determine  howr  these 
techniques  can  be  adapted  to  the  problem  of  office  information  systems  [Ellis  and  Nutt, 
1980], 

Some  preliminary  work  along  these  lines  has  already  been  done.  For  instance,  we 
can  analyze  flow  of  documents  using  graph  theory  and  commodity  flow  analysis.  We 
can  study  bottlenecks  using  queueing  network  analysis.  Finally,  we  can  study 
coordination  using  asynchronous  models  like  Petri  nets  [Zisman,  1977].  In  each  case 
we  analyze  the  flow  of  information  and  the  loads  on  work  stations  and  people.  The  aim 
is  to  try  to  perform  restructuring  of  the  activities,  which  relates  to  some  office 
functions,  while  optimizing  certain  cost  functions  [Ladd  and  Tsichritzis,  1980], 

Conventional  offices  are  bard  to  reorganize  because  of  investment  in  people  and 
materials.  It  is  costly  to  retrain  people  and  to  redesign  office  procedures  and  w'orking 
materials,  e.g.,  business  forms.  In  an  automated  office,  reorganization  may  be  much 
simplier.  The  reorganization  can  be  in  terms  of  automated  procedures  and  hardware, 
rather  than  in  terms  of  people  and  materials.  Automated  offices  should  also  allow  for 
more  complex  procedures  to  be  used  since  the  complexity  can  be  in  terms  of  the 
software  and  hardware  while  still  providing  a  simple  interface  to  the  user.  This  latter 
aspect  of  office  information  systems  is  perhaps  the  most  important  if  users  are  to 
accept  the  automated  office  as  part  of  their  environment. 
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5  USER  REQUIREMENTS 

The  proliferation  of  the  automated  office  ‘vrill  probably  be  paced  by  users’ 
acceptance.  In  order  to  minimize  user  resistance,  it  is  important  to  provide  an 
environment  which  is  satisfactory  to  the  users.  The  requirements  of  users  in  the  office 
will  have  to  be  captured  in  an  appropriate!}^  descriptive  office  mode!  and  expressed  as 
capabilities  of  the  office  information  system.  Human  factors  engineering  will  play  a 
very  important  part  in  translating  user  requirements  into  appropriate  facilities  in  the 
automated  office. 

The  concepts,  methods  and  technologies  in  a  conventional  office  have  evolved 
over  time  as  the  need  for  them  dictated.  The  users  have  had  a  long  time  to  get  used  to 
them  and  to  master  them.  In  the  past,  most  of  the  technology  and  processing  methods 
were  fairly  simple  to  master  by  most  users.  Users  did  not  require  a  great  deal  of 
sophistication.  Office  automation  introduces  more  concepts  and  technologies.  These 
changes  have  to  be  introduced  and  provided  in  a  way  that  the  users  can  cope  with 
them. 

To  determine  how  the  trauma  of  office  automation  can  be  lessened,  it  is 
important  to  investigate  and  understand  potential  sources  of  user  resistance.  Users 
may  react  unfavorably  to  office  automation  for  many  reasons.  The  initial  training  may 
have  a  negative  influence.  The  perceptions  that  one  becomes  a  slave  of  an  automated 
assembly  line  of  office  procedures  may  be  a  strong  deterrent.  The  accountability 
which  the  system  may  force  and  the  formalization  of  office  procedures  may  seem 
rather  dangerous  to  users.  Finally,  people  may  have  a  fear  of  eventually  being 
displaced  in  part  by  these  systems. 

These  concerns  point  to  the  fact  that  office  information  systems  must  provide 
very  friendly  and  easy  to  use  end-user  interfaces.  The  user  must  feel  that  the  system 
is  merely  a  tool  to  be  used  in  the  performance  of  his/her  job  and  is  not  coercive  in  any 
way.  The  tools  must  be  not  only  easy  to  use,  but  must  be  the  right  tools  required  by 
the  user.  Many  diverse  capabilities  are  needed.  As  much  as  possible,  however,  the 
different  capabilities  should  be  provided  in  a  uniform  way  so  that  the  user  is  not 
required  to  learn  umpteen  different  conventions  [Morgan,  1980]. 

An  office  information  system  will  have  to  provide  a  facility  for  querying  the  data 
base  to  retrieve  data  or  perform  some  office  procedure.  This  facility  can  be  in  the 
form  of  a  dialog  language  [Codd  et  al.,  1970],  form  filling  [Hammer  ef  ai.,  1977; 
Tsichritzis,  1979a;  Lefkovits  et  aL,  1979]  or  a  simple  command  language  [Negroponte  et 
aL,  1979].  A  good  deal  of  work  has  been  done  on  the  design  and  human  factors  aspects 
of  data  base  query  languages  [Shrieiderman,  1975].  Most  of  this  research,  however,  has 
been  oriented  towards  fairly  sophisticated,  users  when  compared  with  office  workers. 
Similar  studies  need  to  be  done  for  query  languages  for  office  information  systems. 

It  seems  likely  that  at  least  some  office  information  systems  will  be  implemented 
using  micro-computers  in  a  distribvited  err^eronment  [Flllis  and  N\3tt,  1.9B0;  Cheung  and 
Kornatowski.  1980].  In  this  case,  local  network  communications  will  have  to  be 
provided  to  send  data  from  user  to  user.  The  facilities  for  this  electronic  mail 
capability  will  have  to  hide  the  network  aspects  of  the  communications  from  the  user. 
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Sending  electronic  mail  in  a  network  should  not  be  any  more  difficult  than  addressing 
and  mailing  a  letter  or  sending  a  memo  and  should  probably  be  a  great  deal  simplier. 

Pictorial  communication,  e.g.,  graphics  and  pictures,  and  speech  communication, 
e.g.,  recorded  messages  and  telephone  calls,  also  need  to  be  available  in  the  office 
information  system  interface.  People  have  many  ways  to  communicate  ideas  and 
messages.  Office  information  systems  cannot  limit  them  to  just  one.  e.g.,  printed  text. 
Tliey  should  be  able  to  talk,  point,  relate  images,  etc.  [Herot.  1990;  Negroponte  et  al., 
1979].  This  implies  that  such  things  as  speech  equipment  (telephones),  visual  displays 
(TV  screens),  facsimile  and  character  text  on  paper  have  to  be  available  and  integrated 
in  the  user  interface. 

The  application  of  human  factors  techniques  to  the  design  of  office  information 
systems  is  of  vital  impor  tance  if  the  automated  office  is  to  gain  widespread  acceptance. 
There  is  a  tremendous  need  to  perform  many  human  factors  studies  of  different 
designs  to  determine  the  best  way  of  providing  the  facilities  of  an  office  information 
system  to  the  users.  Both  hardware  and  software  human  factors  experiments  are 
needed. 

We  need  to  determine  the  best  way  for  office  workers  to  interact  with  the  system. 
Should  it  be  by  menu  selection,  command  buttons,  pointing,  speaking,  joysticks,  some 
combination  of  these  or  other,  different  ways?  Within  any  interface,  a  hierarchy  or 
layer  of  facilities  needs  to  be  provided  so  that  first-lime  users  can  be  guided  by  the 
system  while  more  experienced  users  are  not  unduly  annoyed  by  the  system 
interaction.  Facilities  for  interaction  should  be  provided  in  a  uniform  way  within  levels 
as  well  as  between  levels.  This  implies  that  appropriate  software  engineering 
techniques  have  been  applied  to  permit  a  layered  approach  to  interface  design.  It  also 
implies  that  hardware  design  and  software  design  proceed  hand-in-hand  so  that  they 
can  be  properly  integrated.  In  order  to  test  different  interface  designs,  appropriate 
tools  have  to  be  developed  that  will  allow  a  design  to  be  put  together  quickly  and  its 
behaviour  simulated  without  necessitating  a  complete  implementation.  In  this  way, 
many  different  designs  can  be  more  easily  evaluated  and  the  probability  of  coming  up 
with  a  good  design  can  be  greatly  enhanced. 

To  be  successful,  office  information  systems  first  and  foremost  have  to  meet  the 
user’s  reqmrements.  The  users  of  office  information  systems  will  not  be  very 
sophisticated  in  term.s  of  computer  hardware  and  software  technology.  They  will  not 
want  to  have  to  master  a  lot  of  different  approaches  and  technologies.  They  will  not 
want  to  have  to  understand  the  intricacies  of  speech  encoding,  data  communications 
and  text  management.  However,  they  will  want  to  use  word  processing,  electronic  mail, 
filing  and  other  capabilities  in  a  common,  simple  to  use  interface.  Human  factors, 
software  and  hardwai'e  engineering  have  lu  be  applied  to  obtain  a  satisfactory  solution. 
The  solution  will  not  be  of  much  use  if  the  users  are  unwilling  to  accept  it. 


-16- 


Oflice  Information  Systems 


6  SOCIO-ECONOMIC  PROBfJBMS 


OfRce  automation  will  have  a  great  impact  on  the  life  of  office  workers.  Their 
working  conditions  will  definitely  change.  Whether  they  will  change  for  the  ''better"  or 
"worse"  depends  on  one’s  definition  of  "better".  In  economic  terms,  the  changes 
should  be  for  the  better  as  far  as  reduced  costs  and  greater  productivity  go.  Socially, 
in  terms  of  job  satisfaction  and  human  interactions,  there  are  many  unanswered 
questions.  Much  will  depend  on  whether  the  office  worker  sees  office  automation  as  a 
threat  to  his  job  and  to  his  self  worth.  It  already  seems  to  be  the  case  that  very 
talented,  productive  people  benefit  from  office  automation  tools,  since  they  can 
concentrate  on  the  ideas  and  not  their  mundane  representation  and  dissemination. 
However,  less  imaginative  people,  who  had  a  comfortable  feeling  about  participating  in 
a  meaningful  way  in  an  important  activity,  may  feel  displaced. 

Conventional  offices  provide  a  great  deal  of  flexibilily  in  the  utilizalion  of  office 
workers.  For  example,  during  times  of  high  unemployment,  it  is  often  the  goal  of  an 
office  to  utilize  as  many  people  as  possible  in  meaningful  activities.  This  contrasts 
.sharply  with  an  automated  office  which  tries  to  minimize  user  activities  and  operate 


with  the  smallest  number  of  people.  Office  automation  may  also  make  people  feel  like 
part  of  an  assembly  line.  Everyone  is  expected  to  do  a  fixed  job  with  little  individual 
initiative  allowed.  This  contrasts  with  a  conventional  office  where  people  have  special 
talents  and  shortcomings  to  which  an  office  organization  tries  to  adapt.  Office 
automation  will  have  to  take  such  issues  into  account. 

There  is  always  a  fear  that  office  automation  will  displace  many  workers. 
However,  current  office  workers,  at  least  initially,  probably  will  not  be  displaced.  A 
more  probable  scenario  is  that  people  will  have  to  forego  a  number  of  mundane 
activities  which  may  be  boring,  but  at  the  same  time  they  are  relaxing  since  they  do 
not  tax  the  intellect  tremendously.  Hence,  the  automated  office  may  be  more  stressful 
to  workers  unless  this  factor  is  considered. 

Any  automation  process  raises  the  possibility  of  increased  productivity.  However, 
it  is  not  always  true  that  this  will  automatically  imply  improved  quality.  Technological 
advances  have  allowed  the  production  of  more  TV  programs  and  commercial  films.  It  is 
debatable  whether  this  has  resulted  in  an  improved  product.  (Many  people  will  argue 
to  the  contrary!)  Thus,  there  is  a  danger  that  in  an  automated  office  people  will 
concentrate  more  on  superficial  quality,  e.g.,  color  printouts,  sophisticated  formatting, 
rather  than  the  information  content  of  the  messages  or  the  relevance  of  the  office 
procedures. 


'TV, 

J.  LiV. 


need  for  formal  offices  as  we  know  them  today  may  disappear  when  everyone 


has  their  own  personal  computer  tied  in  to  a  communications  network.  There  have 
alvfays  been  prophesies  of  the  office  in  the  home.  Perhaps  office  automation  vrill  be  the 
realization  of  this  scenario.  If  so,  what  will  be  the  social  implications  of  the  demise  of 
the  formal  office  where  many  friendships  and  social  contacts  are  made?  Even  if  we  do 
not  eliminate  formal  offices,  will  we  require  as  much  office  space  as  we  do  now  for  office 
equipment  and  storage  of  documents? 
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Information  systems  have  sometimes  been  decried  for  concentrating  too  much 
power  in  a  few  individuals.  Are  office  systems  another  step  in  this  direction?  When 
everyone  is  connected  to  a  computerized  terminal,  there  is  the  potential  for  covert 
surveillance  of  work  habits  and  productivity.  Perhaps  laws  are  required  detailing  what 
types  of  facilities  office  information  systems  can  provide  to  management  for  these 
types  of  activities. 

Bureaucracies  seem  to  thrive  on  the  situation  nf  lack  of  accountability.  Office 
information  systems  can  dramatically  change  this  situation.  Plow  will  bureaucracies  be 
affected  by  their  introduction?  We  may  get  better  service  but  will  it  be  more 
satisfying?  Studies  have  shown  that  people  are  generally  not  happy  with  computer 
printouts,  even  if  they  have  been  personalized  [Morgan.  19B0].  However,  with  advances 
in  technology  it  will  soon  be  difficult  to  distinguish  between  computer  generated  output 
and  typed  text. 

The  automated  office  has  many  implications  for  the  structure  of  society  as  we 
know  it  today.  There  are  many  questions  that  need  to  be  answered  before  we  plunge 
headlong  Lowai'd  the  auLomaled  office.  The  technical  problems,  although  many  and 
challenging,  are  not  insurmountable.  If  the  auLomaLed  office  is  not  realized  it  will  not 
be  because,  it  xvas  not  technically  feasible.  Rather,  it  will  be  because  people  did  not 
want  it. 


7  CONCLUDING  REMARKS 

Office  automation  and  office  information  systems  are  in  their  infancy.  Tffie 
technology  exists  and  the  market  forces  seem  favorable  for  their  development  and 
introduction.  Because  of  the  riatuxe  of  the  market,  many  different  approaches  will  be 
proposed  and  implemented.  The  users  should  have  several  options  to  choose  from  in 
deciding  on  the  acquisition  of  an  office  information  system. 

Although  the  technology  and  market  exist,  much  research  into  office  information 
systems  is  still  required.  There  is  a  need  to  apply  human  factors,  sufL^^rare  and 
hardware  engineering  techniques  to  design  and  implement  office  inrurmaliun  systems. 
The  systems  must  integrate  several  different  technologies  at  the  user,  software  and 
hardware  levels.  The  user  interface  must  be  human  engineered  to  be  accepte.ble.  The 
systems  have  to  be  flexible  to  allow  for  the  possibility  of  reorganization  of  functions  and 
capabilities  as  the  needs  of  the  office  change. 

All  offices  are  not  alike.  Some  will  require  more  functional  capabilities  than 
others.  Some  will  require  similar  functional  capabilities,  but  provided  in  slightl}' 
different  ways.  One  cannot  hope  to  design  the  office  information  system  that  will  be 
applicable  to  all  situations.  Instead,  it  is  necessary  to  identify  and  provide  the 
apprupriale  functional  capauilities  required  in  an  office.  Softvrare  engineering 
techniques  need  to  be  applied  to  design  the  functional  parts  so  that  they  will  interface 
smoothly  with  each  other  and  so  that  they  can  be  easily  modified  to  meet  specific 
requirements.  In  this  way,  the  office  system  designer  will  have  the  appropriate 
building  blocks  and  tools  with  which  he  can  customize  the  design  to  suit  a  particular 
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situation. 

Y/e  are  poised  for  aiiothei  industriai  revolution;  this  time  in  the  office.  To  what 
degree  it  will  materialize  and  how  fast  it  will  come  is  debatable.  People  using  the 
systems  as  much  as  people  generating  them  will  decide. 
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ABSTRACT 


It  is  generally  recognized  that  human  factors  issues  are  very  important  in  the 
design  and  implementation  of  computer  software.  In  the  past,  however,  human  factors 
issues  were  studied  mainly  after  the  fact  so  that  their  effect  on  software  systems  was 
often  minimal.  OfRne  information  systems  are  now  just  in  their  infancy.  Because  of 
their  potential  wide  impact  and  large  i^ser  community,  human  factors  issues  are  of 
paramount  importance  in  these  systems.  In  this  paper  we  outline  those  issues  in  office 
information  systems  that  require  human  factors  research. 
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1  INTRODUCTION 

When  we  design  a  computer  system  for  use  by  people,  we  usually  have  two 
objectives  in  mind.  First,  we  would  like  the  computer  system  to  fit  the  application  and, 
second,  we  would  like  it  to  fit  the  people  (end-users)  who  will  use  it.  To  meet  the  first 
objective,  we  study  the  interaction  between  the  application  and  the  computer  system, 
noting  the  characteristics  of  each  and  trying  to  make  them  mesh.  To  meet  the  second 
objective,  we  study  the  characteristics  of  the  interaction  between  the  computer 
system  and  the  end-users  and  try  to  make  them  mesh.  This  latter  activity  is  called 
?iumctn /actors  [Lochovsky,  1978]. 

OJJice  inf omnatio-n,  systems  are  computer  systems  designed  to  bring  "automation" 
to  the  office.  They  computerize  such  diverse  facilities  as  word  processing,  electronic 
mail,  forms  processing,  personal  computing,  data  base  management  and  office 
procedures.  In  addition,  the  goal  is  to  provide  all  these  facilities  in  one  integrated 
system  that  will  help  the  office  worker  in  the  performance  of  his/her  job  [Tsichritzis 
and  Lochovsky,  1980]. 

In  the  past,  reseachers  in  computer  science  have  worked  on  human  factors  issues 
in  many  areas  most  notably  in  programming  languages  [Weinberg  and  Schulman,  1974] 
and  data  base  management  [Shneiderman,  1978].  The  emphasis  has  been  on 
evaluating  how  easy  the  languages  or  systems  are  to  use  by  untrained  end-users. 
However,  these  evaluations  are  usually  performed  on  "finished"  systems.  As  a  result, 
the  impact  of  human  factors  research  has  been  more  of  an  expository  nature  than  an 
integral  part  of  the  design  process.  (There  have  been  some  exceptions,  e.g.,  [Reisner, 
1977].) 

We  see  the  issues  in  designing  office  information  systems  that  are  acceptable  to 
the  intended  user  population  as  similar  to  those  in  data  base  maneigement.  The  scope 
and  complexity  of  the  issues,  however,  are  much  greater.  In  data  base  management, 
the  end-user  population  is  usually  composed  of  computer  professionals.  These  people 
understand  computers  and  are  dedicated  to  their  use.  Thus,  they  can  always  be  made 
to  understand  the  "system"  problems  and  be  trained  to  overcome  any  idiosyncrasies  of 
the  system. 

Office  workers  are  not  computer  professionals.  We  cannot  expect  them  to  always 
be  able  to  understand  how  the  system  functions  or  to  provide  them  with  extensive 
training  to  master  hard  to  use  facilities.  We  need  to  provide  them  with  systems  that 
are  natural  to  them  and  are  easy  to  learn  to  use  and  easy  to  use.  In  the  design  of  office 
information  systems,  the  resolution  of  human  factors  issues  will  be  of  paramount 
importance  if  the  office  is  to  be  automated  successfully.  In  the  following  sections  we 
Vv'iil  outline  the  issues  that  require  investigation  in  the  design  of  office  information 
systems. 
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2  DATA  MODELS 

Normally  when  there  is  data  to  be  stored  and  processed,  we  like  to  organize  the 
data  in  some  way.  To  determine  how  best  to  organize  the  data,  we  analyze  it  to  see  if 
there  is  any  pattern  to  the  data  occurrences  or  relationships  among  the  data 
occurrences.  We  then  abstract  these  patterns  of  occurrences  and  relationships  into 
categories  of  data  and  categories  of  relationships  among  them.  The  ways  in  which  we 
can  construct  these  categories  of  data  defines  a  data  model. 

In  data  base  management,  several  data  models  have  been  developed  for 
organizing  computerized  data  [Tsichritzis  and  Lochovsky,  1901].  For  example,  there 
are  hierarchical  data  models  (hierarchies  of  data),  network  data  models  (general 
interconnections  of  data)  and  relational  data  models  (tables  of  data).  These  data 
models  are  very  general  to  allow  a  wide  range  of  applications  to  be  represented.  In 
addition,  many  of  them  are  more  oriented  toward  making  it  easier  for  the  computer  to 
process  data  than  for  representing  the  user’s  information  requirements  [Codd  and 
Date,  1974]. 

We  cannot  expect  that  office  workers  will  understand  or  find  natural  to  use  the 
data  models  used  traditionally  in  data  base  management.  But  what  kinds  of  data 
models  are  appropriate  for  office  information  systems?  How  does  the  worker  in  the 
office  perceive  and  organize  data?  New  data  models  have  to  be  developed  that  are 
appropriate  to  the  office  and  to  the  types  of  people  that  will  be  using  office  information 
systems.  These  data  models  will  have  to  be  much  more  flexible  than  data  base 
management  data  models.  The  kinds  of  data  handled  by  these  data  models,  Le.,  highly 
formatted  data,  are  only  one  type  of  data  found  in  an  office.  Data  models  for  office 
information  systems  should  also  be  able  to  handle  non-formatted  text  data,  picture 
data,  voice  data,  notations  to  data  (both  text  and  voice),  etc.  Not  only  should  they  be 
able  to  handle  these  types  of  data,  but  the  way  in  which  they  are  represented  in  the 
data  model  should  be  easily  understood  and  natural  to  the  people  that  will  be  using  this 
data.  There  is  a  need,  therefore,  when  developing  data  models  for  office  information 
systems  to  perform  human  factors  testing  to  determine  the  suitability  of  the  data 
model  for  the  intended  user  population. 

The  notion  of  forms  as  a  data  model  for  office  information  systems  has  been 
proposed  explicitly  or  implicitly  by  several  researchers  [Ellis  and  Nutt,  1980;  Hammer 
et  al,  1977;  Lcfkovits,  1979;  Tsichritzis,  19B0a,b].  Forms  appear  to  be  a  natural 
candidate  since  they  are  familar  to  office  workers  and  can  represent  much  of  the  data 
that  is  captured  and  processed  in  an  office.  However,  paper-based  forms  have  man}' 
properties  and  it  is  important  that  the  right  ones  be  captured  in  a  forms  data  model. 
In  one  case,  a  forms  data  model  was  developed  that  did  not  allow  multiple -valued  fields. 
It  was  discovered  that  this  is  a  capability  present  in  paper-based  forms  that  users 
really  needed  and  should  have  been  included  in  the  forms  data  model  [Tsichritzis, 
1980b],  In  data  base  management,  data  models  often  constrain  the  user  to  represent 
an  application  in  ways  that  arc  not  natural  to  him  [Lochovsky,  1978].  It  is  important 
that  this  same  mistake  not  be  made  in  office  information  systems. 
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Forms,  while  adequate  for  representing  much  of  the  data  in  an  office,  cannot 
represent  all  of  it.  More  general  data  models  need  to  be  investigated  that  can  capture 
additional  information  handling  aspects  of  an  office  [Tsichritzis,  1980b].  These  data 
models  need  to  be  evaluated  not  only  in  terms  of  their  power  of  representation,  but 
also  in  terms  of  their  acceptability  to  the  end-users. 


3  QUERY  IjtNGUAGES 

Once  we  have  organized  the  data  according  to  a  data  model,  we  would  like  to  be 
able  to  manipulate  it  to  access  data  or  to  modify  it  in  some  way.  The  operations  that 
allow'  us  to  do  this  define  a  query  lang'ijLage.  In  terms  of  office  information  systems,  wre 
are  interested  in  query  languages  that  can  be  used  interactively  and  the 
characteristics  that  such  query  languages  should  have  given  the  characteristics  of  the 
end-users. 

Traditionally  in  data  base  management,  query  languages  have  been  Englisb- 
keyword  oriented.  That  is,  the  user  formulates  requests  in  a  restricted.  Engiish-likc 
grammar  where  specific  words  (keywords)  signal  the  start  of  specific  parts  of  a 
request.  Constructs  in  the  language  must  appear  in  a  fixed  order  and  in  a  fixed  format. 
In  recent  yesirs  other  types  of  query  languages  have  emerged  that  allow  requests  to  be 
formulated  by  giving  the  system  an  example  of  the  reply  to  a  question  [Zloof,  1977]. 
For  example,  templates  of  tables  can  be  displayed  on  a  screen  and  the  user  fills  in  the 
table  according  to  which  columns  he  wants  in  the  reply  and  the  kinds  of  values  desired 
in  each  column.  Menu-based  query  languages  and  graphics-based  query  languages 
have  also  been  proposed,  but  not  used  widely  [McDonald  and  Stonebraker,  1976;  Ellis 
and  Nutt,  1980]. 

It  is  not  clear  which,  if  any,  of  these  types  of  query  languages  are  suitable  for 
office  information  systems.  Before  w'e  can  answer  this  question,  wc  first  need  to 
determine  what  types  of  querying  facilities  are  required  by  office  workers.  Do  they 
require  the  sophisticated  facilities  provided  by  data  base  management  query  languages 
or  will  simpler  facilities  do?  Are  the  characteristics  of  cui'rent  query  languages 
suitable  for  the  office?  Can  they  be  used  by  people  eight  hours  a  day  in  performing 
their  job? 

The  facilities  provided  by  query  languages  for  data  base  management  systems 
need  to  be  general  to  handle  diverse  applications.  In  office  information  systems  the 
query  languages  can  be  much  more  specialized,  it  would  seem  unlikely  that  English- 
keyword  query  languages  would  be  suitable.  They  usually  employ  complex  constructs 
that  require  some  training  to  master  [Reisner,  1977;  Lochovsky,  1978].  In  addition, 
they  require  a  good  deal  of  input  (typing)  by  the  user  and  do  not  have  particularly  good 
interaction  characteristics.  By-example  query  languages  may  be  appropriate  if  they 
can  be  tailored  to  the  data  model  and  the  types  of  operations  required  in  office 
information  systems. 
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It  is  important  not  to  be  restricted  by  past  experience  in  designing  query 
languages  for  office  information  systems.  Much  more  sophisticated  technology  is  now 
available  and  it  should  be  exploited  wherever  possible.  Many  types  of  technologies  such 
as  graphics,  sound  and  colour  can  be  combined  to  provide  a  very  user-oriented  query 
language  [Herot,  1980].  Research  is  needed  to  determine  how  best  to  combine  these 
facilities  to  provide  the  types  of  capabilities  required  in  office  information  systems. 


4  COMMUNICATION 

The  objective  of  gathering  euid  processing  data  is  usually  to  communicate  the 
results  of  the  processing  to  people  so  that  they  can  make  decisions.  Traditionally, 
these  results  are  produced  as  printed  reports  and  disseminated  by  hand  to  the 
appropriate  individuals.  With  today’s  technology,  electronic  mail  and  networks  can  be 
used  to  disseminate  information  electronically.  These  capabilities  have  to  be 
integrated  into  an  office  information  system,  it  is  important  to  study  the 
characteristics  of  the  communication  processes  in  an  office  so  that  they  can  be 
supported  in  a  natural  way  by  office  information  systems. 

People  have  many  ways  to  communicate,  written,  verbal,  visual,  etc.,  and  as  many 
of  these  as  possible  should  be  supported  by  an  office  information  system.  To  support 
these  different  modes  of  communication,  different  technologies  will  have  to  be 
integrated  into  a  uniform  communication  capability.  It  is  important  that  these 
facilities  be  provided  in  a  way  that  is  acceptable  to  the  end-users.  Studies  are  needed 
on  the  design  of  hardw^are  and  software  components  of  the  communications  facility  of 
an  office  information  system. 

As  well  as  modes  of  communication,  people  also  have  different  levels  of 
communication,  e.g.,  formal,  informal  and  personal.  An  office  information  system 
should  be  able  to  distinguish  between  these  different  modes  and  be  able  to  support 
them.  The  last  level  of  communication  may  be  particularly  important  to  the 
acceptability  of  office  information  systems  to  the  users.  One  may  naturally  be  inclined 
to  exclude  this  type  of  communication  from  the  system  since  it  is  not  "productive”  to 
the  organization.  However,  people  have  a  need  to  communicate  personally  and  if  they 
cannot  do  it  through  the  system,  they  -will  do  it  outside  the  system.  Therefore,  simply 
as  an  incentive  to  use  the  system,  it  may  be  important  to  support  personal 
communication  and  even  encourage  it. 

As  w'ell  as  determining  how  people  communicate  in  an  office,  studies  are  required 
to  determine  how  best  to  provide  the  means  of  communication  within  an  office 
information  system.  It  should  be  no  more  difficult  to  communicate  with  someone 
through  an  office  information  system  than  it  is  to  do  it  via  the  telephone  or  by  walking 
down  the  hall.  These  facilities  need  to  be  integrated  with  the  data  model  and  query 
language  facilities  so  that  their  use  is  compatible  with  these  other  facilities. 
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5  INTERACTION 

An  office  is  not  merely  a  place  to  do  work.  It  is  a  social  environment  in  which 
people  must  feel  happy  and  useful  if  work  is  to  be  done  at  all.  The  introduction  of  office 
information  systems  could  have  a  positive  effect  or  a  negative  effect  on  this 
environment.  A  great  deal  depends  on  how  the  issues  outlined  in  the  previous  sections 
are  resolved.  For  this  reason,  it  is  important  when  dealing  with  these  issues  to  always 
consider  the  larger  issue  of  the  office  as  a  social  environment.  There  are  many 
interactions  that  take  place  within  this  environment,  e.g.,  person  to  person,  person  to 
computer,  etc.,  that  need  to  be  considered  in  designing  office  information  systems. 

The  way  in  which  the  user  interacts  with  the  system,  as  much  as  the  types  of 
facilities  provided  will  be  a  crucial  factor  in  the  system's  acceptability.  The  interaction 
should  be  interesting  and  not  tedious.  The  user  should  be  able  to  interact  with  the 
system  in  different  ways  depending  on  his  mood,  the  type  of  work  to  be  done,  etc. 
Above  all,  the  user  should  always  feel  that  he  is  the  one  who  is  in  control  of  the 
interaction,  not  the  system.  Even  though  the  system  is  always  in  control  ultimatel5^  it 
should  make  the  user  feel  he  is  in  control.  The  l  esponsiveness  and  helpfulness  of  the 
system  are  factors  that  can  contribute  to  a  feeling  of  control. 

The  personal  interactions  in  an  office  are  as  important  as  the  business 
Interactions,  We  have  already  indicated  that  office  information  systems  should  support 
personal  communication  between  people.  Thought  should  be  given  to  providing 
recreational  support.  For  example,  the  system  could  support  games  between  it  and 
people  as  well  as  between  people.  Such  support  could  have  an  educational  as  well  as  a 
recreational  value  in  that  it  can  be  a  way  for  people  to  learn  to  use  the  system’s 
facilities  in  a  non-pressured  mode  of  interaction. 


6  SUMMARY 

In  the  past  systems  were  often  designed  without  much  thought  given  to  human 
factors  issues.  Subsequent  evaluations  have  shown  the  inappropriateness  of  this 
approach.  Office  information  systems  are  in  their  infancy.  Now  is  the  time  to  consider 
the  human  factors  issues  involved  in  their  design  and  in  their  introduction  into  the 
office. 

In  this  paper  we  have  outlined  the  areas  in  which  there  is  a  need  for  human 
factors  research  m  office  information  systems.  These  areas,  data  models,  query 
languages,  communications  and  system-user  interaction,  relate  to  the  interface 
between  office  inforriialion  systems  and  the  people  who  will  use  them.  Unlike  other 
computer  systems,  deliciencies  in  the  human  factors  engineering  of  these  systems  will 
impact  on  a  large  user"  population.  The  human  factors  aspects  of  their  design  will  be  as 
important  as  the  facilities  they  provide. 
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Abstract 

This  paper  consists  of  three  interrelated  parts.  In  the  first  part  forms  are 
introduced  as  an  abstraction  and  generalization  of  business  paper  forms.  A  set 
of  facilities  for  the  manipulation  of  forms  and  their  contents  is  outlined.  F’orrns 
can  be  created,  stored,  found,  viewed  in  different  media,  mailed  and  located  by 
office  workers.  Data  on  forms  can  also  be  processed  in  a  completely  integrated 
way.  The  facilities  are  discussed  both  abstractly  and  in  relation  to  a  prototype 
system.  In  the  second  part  a  facility  is  outlined  for  the  specification  and  imple¬ 
mentation  of  automatic  form  procedures.  These  procedures  specify  actions  on 
forms  which  are  txiggered  auloniaiically  when  certain  preconditions  are  met. 
The  preconditions,  actions  and  specification  method  are  based  on  forms.  I'he 
di.soussinn  is  centered  on  our  implementation  of  such  a  specification  framew'ork. 
P^inally,  in  the  third  part,  techniques  for  the  analysis  of  office  flow  are  specified. 
An  algorithm  is  outiined  for  the  cai.egorization  of  forms  in  classes  depending  on 
the  local  routing  and  actions  on  the  forms.  In  this  way,  v/e  can  obtain  the  paths 
that  forms  take  and  analyze  the  system  for  correctness  and  loading  characteris¬ 
tics. 
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1.  INTRODUCTION 

Ofnce  Information  Systems*  (OIS)  are  becoming  increasingly  important 
both  in  terms  of  commercial  predicts  and  in  terms  of  research  directions  [Ellis 
and  Nutt  19S0].  The  main  driving  forces  behind  their  emergence  are  user  needs 
[Zisman  1978]  and  technological  developments  [Tsichritzis  and  Loehovsky  1980]. 
We  will  not  elaborate  in  this  paper  on  the  need  for  OiS.  Ae  also  will  not  tackle 
the  thorny  issues  of  user  acceptance,  and  socioeconomic  problems  i elated  to 


the  introduction  of  office  infm-niatioii  systeins  in  the  offi.ee  [.'viorgan  1380].  V»e 
will  concentrate  on  what  we  perceive  to  be  an  important  research  direction  m 
this  area. 

A  serious  technical  problem  related  to  OIS  is  integration.  One  aspect  of 
integration  is  the  combination  of  facilities  which  in  the  past  were  provided  by 
difi’erent  machines  on  different  media,  e.g.,  typing,  copying,  telephones,  etc. 
Another  aspect  of  integration  is  to  provide  services  which  in  the  past  were  pio- 
vided  by  different  companies  and  business  sectors.  At  the  samic  time  an  OTS 
should  pro\'ide  a  uniform  interface  to  the  user.  These  diverse  requirements 
necessitate  coordination  and  integration  at  both  the  technical  and  the  human 
interface  level.  That  is,  voice  communications,  graphics,  query  languages,  word 
processing  and  electronic  mail  among  others  have  to  he  available  within  the 
same  system.  In  addition,  they  have  to  present  to  the  user  a  uniform  and  flexi¬ 
ble  interface.  The  user  should  not  have  to  be  trained  to  use  an  individual  system 
with  its  own  peculiarities  for  each  service  he  needs.  Achieving  integration  of 


•We  prefer  the  term  Ofllce  Informalioii  Systems  rethei  Uid.ii  hiTiee  Aulumetioa.  Ofuee  Aato- 
malion  has  a  negative  image  of  unemployed  persons  arid  inh>iJTian  assembly  line.s  of  office 
v/orkers. 
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diflerent  facilities  with  a  sound  architecture  and  clean  interfaces  is  a  difficult 
and  challenging  problem. 

In  the  first  part  of  the  paper  we  will  outline  a  set  of  facilities  for  an 
integrated  office  information  system.  We  will  also  discuss  a  subset  of  these  facil¬ 
ities  which  are  incorporated  in  a  prototype  system.  It  is  impoitant  to 
differentiate  between  what  we  consider  important  to  provide  and  what  we  have 
implem_ented  within  ovir  limited  reFioumes.  In  many  cases  we  had  to  comprom¬ 
ise  to  scale  down  the  implementation  effort.  We  will  try  to  state  very  clearly  the 
distinction  between  conceptual  framewmrk  and  implementation  effort.  We  vdll 
also  try  to  indicate,  whenever  possible,  the  reasons  for  deviating  from  our  con¬ 


ceptual  framework  in  the  prototype  system, 

A  second  technical  problem  related  to  Olb’  Is  specification  of  office  pro¬ 
cedures.  For  OIS  to  be  truly  helpful  there  is  a  need  for  the  ability  to  specify  gen¬ 
eral  office  procedures.  With  this  approach  mundane  and  boring  jobs  can  be 
automated.  These  procedures  should  be  invoked  according  to  prespeoified  con¬ 
ditions  and  require  minimum  user  intervention.  In  this  way,  the  system  can  per¬ 
form,  according  to  specifications,  many  activities  in  the  office.  A  powerful  office 
procedure  specification  language  is  an  important  aspect  of  an  OIS.  Office  pro¬ 
cedure  speciffcatien  tools  should  allow  the  users  to  use  the  S3''stem  in  two 
different  'ways-,  they  can  initiate  operations  using  the  facilities  of  the  system 
directly,  or  they  can  cooperate  with  the  system,  to  carry  out  prcspccifcd  office 
p)roL*edur-es. 

in  the  ser  orid  part  of  the  paper  we  will  outline  a  facility  for  the  specification 


of  office  procedures.  The  facility  allows  users  to  specify  ill  a  Siiiipic  Wciy  precon¬ 
ditions  under  which  the  form  procedur'es  should  be  triggered.  It  also  pi  ovides  a 
specification  tool  for  the  actions  that  the  procedures  should  perform  automati¬ 
cally  when  they  arp  triggered.  The  specificatinn  tool  outlined  has  been  impie- 
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mented  and  the  main  aspects  of  the  implementation  will  be  discussed. 

A  third  important  technical  direction  for  OIS  is  tools  for  understanding, 
modelling,  analyzing  and  designing  these  systems.  By  introducing  computers  in 
the  office  there  is  a  risk  of  many  unpredictable  malfunctions.  Human  intuition  is 
not  always  present  to  catch  abnormal  beha\ior  and  iiandle  special  cases.  If 
there  is  a  design  oversight  the  s3'^stem  will  not  operate  correctl}'".  These  abnor¬ 
mal  situations  cannot  be  detected  by  inspection  and  informal  techniques.  There 
is  a  need  to  portray  the  flow  of  documents,  the  coordination  requirements  and 
the  structure  of  the  office  information  systems.  There  is  a  need  for  require¬ 
ments  specification  tools,  design  tools,  and  modelling  and  analysis  tools. 

In  the  third  part  of  the  paper  we  will  outline  an  approach  for  the  modelling 
and  analysis  of  some  office  characteristics.  We  will  concentrate  on  the  flow  of 
documents  in  the  office.  The  analysis  relates  local  conditions  in  a  station  with 
paths  that  documents  take  in  the  office.  Using  this  analysis  we  can  investigate 
global  correctness  properties  of  the  office  on  the  basis  of  local  behavior  as 
specified  by  automatic  form  procedures. 

2.  FORMS 

2.1.  Form  Definition 

To  achieve  integration  of  different  facilities  and  services  in  our  system,  we 
chose  to  have  an  integrating  concept.  That  is,  an  idea  vrhich  permeates  our  sys¬ 
tem  and  our  models.  The  integrating  concept  chosen  is  forms.  Usually,  compu¬ 
terized  forms  are  thought  of  as  logical  images  of  business  paper  forms.  How¬ 
ever,  forms  in  our  framework  can  be  much  more  general  and  elaboraLe  lhaa 
business  paper  forms.  Forms  oriented  facilities  have  been  proposed  and  even 
implemented  in  the  past.  For  instance,  forms  have  been  proposed  in  the 
CODASYL  report  for  the  end  user  facility  [Lefkovitz  et  al].  They  have  been  used 
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in  Rni.  [Hammer  et.  al  1977]  and  in  OfRcetalk  [Ellis  and  Nutt  1980].  However,  we 
hope  to  demonstrate  that  forms  have  a  different  role  in  our  system  because  of 
the  relative  emphasis  placed  on  them.  Ail  operations  in  our  framework  are  based 
on  forms.  The  operations  are  sometimes  issued  through  forms. 

Forms  represent  a  structured  approach  to  office  messages.  We  will  assume 
that  it  is  both  worthwhile  and  feasible  to  capture  a  regularity  of  messages  in  an 
office  through  form.s.  Yfhcthcr  such  regularity'  exists  naturally'  is  subject  to 
debate.  It  is  not  clear  that  there  is  always  a  way  of  constraining  communication 
and  knowledge  to  be  formatted.  There  are  many  cases  where  unconstrained 
language  in  terms  of  text  or  speech  is  the  only  format  we  can  assume  for  our 
communication  or  recording  of  knowledge.  Forms  impose  structure  in  commun¬ 
ication  messages  in  the  same  way  that  formatted  data  impose  structure  on 
knowledge.  Forms  guide  the  user  in  filling  a  request  for  information.  The  user  is 
not  left  free  to  create  through  symthesis  any'  message  he  wants.  This  guidance 
can  be  considered  constraining  by  some  people.  At  the  same  time,  it  is  helpful 
in  formulating  a  precise  message  in  a  rather  straightforvrard  manner.  The  pre¬ 
cision  and  absence  of  ambiguity  is  also  helpful  for  the  receiver  of  the  message 
for  a  clear  interpretation.  We  do  not  claim  that  forms  can  deal  with  every'  situa¬ 
tion.  They'  deal  with  enough  ofFce  acthity,  however,  to  warrant  very  careful 
treatment. 

Before  we  proceed  with  a  definition  of  forms  it  is  worthw-hile  to  elebnrate  on 
the  nature  of  business  paper  forms.  They  can  be  thought  of  as  text  which  con¬ 
tains  values  in  certain  slots.  A  filled  form  can  be  thought  of  as  a  document. 
However,  some  documents  are  not  forms  except  in  a  trivial  sense.  Forms  arise 
because  there  are  many  documents  which  are  similar.  They  are  similar  in  that 
they  have  the  same  structure  and  the  same  printed  words.  They  differ  from 
each  other  in  terms  of  the  filled  entries.  We  can  view,  therefore,  forms  as 
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corre spending  to  a  text  template.  For  each  set  of  filled  entries  this  text  tem¬ 
plate  gives  us  a  document.  This  document  can  be  treated  from  then  on  as  text  if 
we  so  choose.  If  we  retain  the  structure  of  the  form  it  is  very  easy  to  separate 
the  values  from  the  template  of  the  form.  If,  however,  we  lose  the  distinction 
and  look  at  a  form  as  a  document  it  may  be  difficult  to  reverse  the  mapping  and 
separate  the  template  and  values  from  the  document.  After  this  informal  dis¬ 
cussion  we  will  define  forms  by  making  some  generalizations  on  text  for  ms. 

The  first  and  most  important  generalization  is  the  definition  of  forms  in 
terms  of  communication  media  other  than  text.  Yfe  make  the  basic  assumption 
that  messages  in  other  communication  media  can  have  the  same  regularity  as 
text  messages.  They  can,  therefore,  be  amenable  to  the  same  treatment  as 
forms.  A  text  form  corresponds  to  a  text  template  which,  for  certain  values, 
becomes  a  text  document.  In  the  same  way  we  can  have  a  speech  form  which 
corresponds  to  a  speech  template  for  voice  communication.  Given  certain 
utterances  as  values,  the  speech  template  can  generate  a  voice  message.  In  the 
sanie  way,  we  can  have  a  display  form  which  corresponds  to  a  template  for  a  2- 
dirnensional  visual  display.  We  can  have  a  print  form  which  corresponds  to  a 
print  template  for  generating  printed  documents.  Finally,  the  most  primitive 
view  of  a  form  is  to  retain  only  its  attribute  values. 

The  second  generalization  is  to  allow  the  same  values  to  drive  more  than 
one  template,  provided  they  are  compatible.  A  form  may  have  more  than  one 
incarnation  given  that  it  has  the  corresponding  templates.  For  instance,  the 
same  values  can  derWe  a  speech  message,  a  text  message,  or  a  display  message 
provided  the  three  templates  are  compatible.  A  form  may  even  have  two  incar¬ 
nations  wdthin  the  same  medium  of  communication.  For  instance,  we  may  have 
two  text  templates  which  will  give  tw'o  text  messages  for  the  same  field  values. 
We  may  also  choose  to  use  the  same  template  for  more  than  one  purpose.  For 
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instancs,  a  text  template  may  be  used  as  a  display  template  and  a  print  tem¬ 
plate  provided  there  arc  some  standard  default  Tvays  to  display  and  print  text. 


The  third  generalization  on  business  forms  involves  the  relationship 
between  a  form  and  its  values.  In  business  forms  this  relationship  is  rather 
straightforward.  There  is  a  skeleton  of  tc5{t  and  appropriate  slots  where  the 
values  are  filled.  However,  the  relationship  between  values  and  final  documents 
can  be  much  more  general.  Consider  for  instance,  a  letter  template  which  for  a 
given  set  of  names  generates  a  standard  form  letter.  Letters  generated  in  this 


manner  lack  a  certain  personal  touch.  There  are  many  refinements  that  can  be 
incorporated  in  such  a  letter.  For  instance,  suppose  a  letter  is  addressed  to  a 
friend.  The  letter  should  begin  v/ith  "Dear  Fred”  rather  than  "Dear  Dr.  Lochov- 
sky"  and  end  with  "Dionysis"  rather  than  full  name  and  title.  All  these  transfor¬ 


mations  can  be  effected  if  the  mapping  represerited  by  the  template  is  more 
general.  A  l.tjrnplaLe  sliould  not  be  considered  as  a  skeleton  wlieie  sluts  are 
filled.  It  should  be  considered  as  a  general  mapping  which,  given  the  attribute 
values,  generates  text,  voice  messages,  etc.  depending  on  the  template. 


Finally,  the  fourth  generalization  involves  th^  type  of  operations  allowed  on 
forms.  In  business  forms  users  arc  only  allowed  to  read,  enter  values,  send, 
receive  and  locate  forms.  We  may  envision,  however,  other  more  gener  al  opera¬ 
tions  that  can  be  effected  on  forms.  Consider,  for  instance,  an  invoice  as  a  form. 
The  totals  of  the  invoice  are  calculated  by  the  user  and  entered  in  the  invoice. 
fiB  could,  however,  define  the  field  entry  operation  in  such  a  way  that  the  totals 


are  eoniputed  automatically.  This  is  a  type  of  operation  which  is  not  available  in 
business  paper  form  processing.  It  is  based  on  knowledge  about  the  form’s  con¬ 
tents.  This  knowledge  about  the  form  can  be  encoded  in  the  form’s  operations 
rather  than  staying  outside  the  form  system.  This  would  allow  semantic  integrity' 
constraints  and  special  side  effects  to  be  defined  within  the  form  framework. 
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There  is  a  need  for  the  ability  to  define  operations  which  are  specific  to  a 
form  type.  In  this  framework  there  is  also  a  need  to  amend  the  operations 
issued  through  a  template  so  they  will  not  violate  semantic  constraints  which 
should  be  present  in  a  form.  Consider,  for  instance,  the  invoice  example.  Calcu¬ 
lating  the  totals  can  be  viewed  as  a  special  side  effect  of  entering  the  item  costs. 
Hence,  the  operation  of  entering  values  in  this  case  has  to  be  augmented  to  pro¬ 
duce  these  desirable  side  effects.  Suppose  a  user  wants  to  change  the  value  of  a 
total  in  an  invoice  through  a  template,  e.g.,  looking  at  the  invoice  as  text.  This 
operation  which  is  allowed  on  other  fields  should  be  disallowed.  That  is,  the 
modification  operation  is  augmented  with  a  side  effect  that  warns  the  user  that 
he  has  to  change  also  some  item  costs. 

The  definition  of  operations  specific  to  form  types  gives  them  a  status  as 
abstract  data  types.  We  chose  to  define  type  specific  opei  alioiis  by  augmenting 
a  set  of  standard  operations  with  procedures  associated  with  each  form 
The  procedures  generate  the  desirable  side  effects  and  change  a  standard 
operation  into  a  special  operation  for  that  type. 

We  are  ready  now  to  define  forms.  In  our  definition  we  will  distinguish 
between  form  types,  form  instances,  forms  and  form  templates.  A  form  type 
represents  a  data  type  defined  for  forms,  h  form  instance  is  an  instance  of  that 
type  which  may  incorporate  additional  information  to  the  form  attribute  values. 
The  term  form  -will  be  used  informally  to  refer  to  the  values  of  the  attributes  of 
form  instances.  In  many  cases  the  terms  form  instance  and  form  will  be  used 
indistinguishly.  Finally,  a  form  template  is  a  mapping  which  maps  a  form 
instance  into  a  message  in  a  particular  communication  medium.  For  instance, 
in  a  business  form  the  printed  wort^vill  be  part  of  the  template.  Given  a  set  of 
values  for  the  slots,  the  template  generates  a  text  message. 


A  form  type  consists  of  a  set  of  attributes  Xa,Xx,  ■  •  •  of  which  is  a 
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special  attribute  (external  identifier)  and  a  set  of  operation  procedures 
P\‘  '  '  '  Pn-  ^  procedure  can  be  optionally  associated  with  a  standard  opera¬ 
tion  qi  on  a  form.  If  no  procedure  is  associated  with  an  operation  qi  then  the 
standard  definition  qi  is  performed  whenever  q^  is  issued.  If  a  procedure  p^  is 
associated  with  an  operation  q^  then  the  procedure  p^  is  invoked  whenever  7^  is 
invoked. 

The  special  attribute  is  given  values  by  the  system.  The  value  for  is  a 
system  wide  unique  identifier  for  each  form  instance.  It  can  be  seen  by  the  user 
but  a  user  cannot  modify  its  value.  Every  attribute  X^,  except  can  have  mul¬ 
tiple  values  within  a  form  instance.  In  the  general  case  repeating  groups  are 
allowed  as  attributes.  Yfe  can  also  have  compound  attributes  in  forms.  That  is, 
the  form  type  vieived  as  a  relation  does  not  have  to  be  in  first  normal  form.  A 
table  can  be  bandied  as  a  single  attribute  inside  a  form. 

The  procedures  pi,  •  ■  •  are  initiated  when  the  corresponding  operations 
are  issued.  The  procedures  may  check  domains  of  values,  they  may  enforce 
different  constraints  and  trigger  actions  including  actions  affecting  other  attri¬ 
butes  and  other  forms.  The  procedures  have  available  to  them  time  of  opera¬ 
tion,  station  where  operation  is  issued  and  person  issuing  the  operation.  Some 
of  this  information  may  be  retained  by  the  procedure  at  the  time  of  operation 
invocation  and  it  will  be  denoted  by  Pi(y)  where  y  denotes  the  particular  invoca¬ 
tion  of  the  operation  q^.  If  no  procedure  p.^  is  specified  for  operation  it.  means 
that  no  special  effects  are  needed  and  the  operation  will  proceed  in  its  standard 
form,  fi'o!"  instance,  if  no  procedure  is  associatea  witr-  entering  a  value  for  an 
attribute  A,  it  means  that  any  value  can  be  entered  for  X^,  at  any  time,  by  any¬ 
body  without  triggering  any  special  side  effects. 


A  set  of  templates  can  be  associated  with  a  form  type  which  define  the  map¬ 
pings  from  a  form  to  a  bit  string  vrhich  can  be  interpreted  in  a  particular  ’.vay. 
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For  example,  a  data  template  R  can  strip  a  form  instance  of  some  of  its  data  and 
retain  only  certain  values,  a  text  template  T  can  give  the  mapping  of  a  form 
instance  to  a  text  document,  a  template  K  can  give  the  mapping  to  a  voice  mes¬ 
sage,  etc.  communication  medium.  The  same  template  can  be  used  for  more 
than  one  communication  medium  provided  there  is  an  inherent  correspondence 
between  them.  If  a  template  is  not  specified  a  default  standard  template  may  be 
assumed.  For  instance,  the  default  text  template  may  be  to  list  the  form  t3rpe 
name  X.  the  attribute  names  and  their  values.  A  template  can  only  be  used  on  a 
form  if  it  is  compatible  with  the  form’s  type,  e.g.,  the  template  can  accept  the 
attributes  A’c.A’i,  •  •  •  ,X^  as  parameters.  The  templates  are  not  an  inherent  part 
of  a  form  type’s  definition.  Templates  can  be  added  at  any  point  after  a  form 
type’s  definition.  In  this  way,  users  can  sat  up  or  change  templates  to  '/iev,^ 
forms  ■^vithout  affecting  the  basic  definition  of  the  form’s  type. 

A  form  instance  x  of  type  X  is  a  set  of  values  x^,x^,  ■  ■  •  .Xn  corresponding  to 
the  attributes  Xq,X\,  •  •  •  ,Xn  and  a  set  of  values  obtained  by  the  application  of 
the  operation  procedures  Piiyj)  for  each  invocation  of  operation  A  form 
instance  retains  all  information  which  the  procedure  returns  after  its  invoca¬ 
tion.  For  instance,  suppose  pi  is  specified  to  return  the  person  issuing  the 
operation  g^.  In  this  case,  a  form  instance  will  retain  all  persons  which  have 
issued  an  operation  gt  to  that  form  up  to  the  moment  we  look  at  the  form 
instance.  The  form  instance  can  retain  all  pertinent  historical  information  in  a 
form’s  life  cycle.  The  simple  term  form  will  denote  the  values  of  the  attributes 
Xo.Xi,  •  ■  ■  ,Xn  present  in  a  form  instance.  In  many  cases  the  form  is  almost 
equivalent  to  the  form  instance.  The  reason  is  Lha.L  many  of  the  procedures  pi 
only  check  integrity  constraints  and  they  do  not  retain  information. 

A  set  of  template  instances  can  be  defined  corresponding  to  a  form  instance 
according  to  different  templates.  For  example,  for  a  given  form  instance  x  we 
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can  define  a  text  instance  T(x),  a  voice  instance  V(x)  or  a  display  instance  D(x). 
Given  a  form  instance  x  its  template  instances  can  be  produced  by  the  definition 
of  the  templates.  Notice  that  a  form  instance  x  is  not  equivalent  to  any,  or  ail 
its  template  instances.  The  templates  are  functions  on  the  form  instance  and 
they  do  not  have  to  be  reversible.  They  may  abstract  some  information  retained 
in  X  or  may  even  transform  the  information  in  an  irreversible  way.  Most  tem¬ 
plates  are  not  associated  with  form  instances  but  only  with  the  values  of  the 
attributes  in  the  form  instance,  i.e.,  the  form.  The  form  instances  may  retain 
much  more  information  that  users  seldom  wish  to  see. 

In  our  framework  the  user  can  view  a  message  in  many  formats,  e.g.,  voice, 
text,  display.  There  may  be  the  illusion  that  the  system  can  transform  between 
these  different  messages.  In  fact,  the  system  only  retains  the  internal  structure 
captured  by  the  form  instance  and  can  provide  only  transformations  from  it 
according  to  the  different  templates.  The  system  can  produce  voice  or  text,  but 
it  dues  not  understand  voice  or  text  except  in  the  limited  way  encoded  in  the 
form  instance. 

Consider  as  an  example  an  application  for  ordei'  entry  and  billing  for  a 
wholesale  distribution  company.  Sales  clerks  operate  the  telephones  and  enter 
orders  as  forms  in  the  system.  The  orders  go  to  the  warehouse  for  filling  and 
then  to  the  accounting  office  for  billing.  The  accounting  office  retains  a  record 
for  the  sale  and  reconciles  it  with  the  payments  received  eventually  from  the 
customer.  All  templates  can  be  very  simple  consisting  of  a  skeleton  e.nd  slots 
with  appropriate  attribute  names  (Figure  1.1-l.G).  Notice  that,  in  this  example, 
the  whole  invoicing  operation  can  be  handled  with  just  one  form  type  having 
different  templates.  The  same  form  can  go  from  Sales  to  the  Warehouse  to 
Accounting  having  different  templates  in  each  case  (Figure  2).  Having  one  form 
for  the  invoicing  operation  simplifies  consistency  checks  between  the  values  of 
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diflerent  forms.  All  manifestations  of  a  sales  transaction  can  be  represented  in 
one  form  "which  has  different  incarnations  according  to  the  different  templates. 
There  is  only  one  copy  of  the  values  for  each  form  which  is  viewed  as  an  ox-der,  a 
shipping  slip,  an  invoice  and  a  warehouse  message.  If  official  records  need  to  be 
kept  in  each  station,  then  we  will  have  to  make  ofnciai  copies.  In  this  case  we 
need  to  worry  about  consistency  between  these  copies. 

It  is  appropriate  at  this  point  to  comment  on  the  difference  between  form 
types  and  instances  as  defined  in  this  section  and  records  as  traditionally  viewed 
in  data  base  management  systems.  The  first  obvious  difference  is  the  interpre¬ 
tation  in  terms  of  the  "printed  words"  given  to  the  values  of  a  record.  Our 
framework  also  allows  many  different  interpretations  according  to  different  tem¬ 
plates.  lio-wever,  we  feel  that  this  is  only  a  very  superficial  difference.  It  has  to 
do  with  formatting  input  and  output  rather  then  forms  as  data  types.  A  second, 
and  much  more  important,  difference  is  in  terms  of  operations  and  side  effects. 
Operations  on  records  are  usually  uniformly  denned.  That  is,  there  is  no 
difference  in  the  meaning  of  the  operation  between  different  record  types.  In 
forms  there  is  a  very  real  need  to  customize  operations  according  to  types.  This 
is  allowed  in  advanced  data  models  like  semantic  networks.  It  is,  however, 
absent  in  mi.ost  standard  data  models  and  definitions  of  records.  We  have  fol- 
lo'wed  in  our  definition  the  same  approach  as  found,  for  instance,  in  Abrial’s  work 
[Abriai  1974],  The  operations  are  customized  by  augmenting  a  sLandaxd  opera¬ 
tion  with  a  procedure.  We  also  allow  and  expect  side  effects.  That  is,  an  opera¬ 
tion  on  a  form  can  have  side  effects  on  this  form,  or  on  other  forms.  Side  effects 
are  very  important  in  an  office  environment.  They  represent  real  needs  and  not 
spurious,  un'wanted  operations.  Some  advanced  data  .models  like  semantic  net¬ 
works  do  allow  side  effects.  The  standard  data  models  try  to  constraint  or  elim¬ 
inate  them.  They  try  to  localize  operations  in  one  record.  For  instance,  side 
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eflects  in  the  relational  model  have  been  cedled  anomalies,  Le.,  undesirable. 
There  is  a  whole  data  base  theory  which  tries  to  push  them  aside,  with  some¬ 
times  limited  sueeess.  In  terms  of  forms  side  effects  are  Important,  they  are 
needed  and  they  should  be  dealt  wdth  precision  and  elegance. 

2.2.  Forms  in  our  system 

In  our  prototype  system  forms  are  further  restricted  in  many  ways  for  ease 
of  implementation  [Tsichritzis  1980,  Kornatowski  and  Cheung  1980].  First,  attri¬ 
butes  are  not  compound  and  they  are  single  valued.  The  main  reason  for  this 
restriction  is  to  avoid  difficulties  with  displaying  arbitrary  repeating  groups 
under  the  display  template.  It  appeal's,  however,  that  users  would  like  to  have 
repeating  groups  and  they  should  be  pro\’ided.  It  is  also  a  good  idea  to  option¬ 
ally  retain  old  values  as  part  of  the  form  instance.  In  our  prototype  system  only 
the  most  current  value  for  an  attribute  is  retained.  Second,  the  operation  pro¬ 
cedures  are  rudimentary.  They  only  check  range  of  values  of  attributes  at  entry 
lime  and  relaiii  the  station  number  from  which  the  entries  are  made.  They  also 
check  an  attribute  characteristic  which  divides  attributes  in  three  kinds:  not 
modifiable  and  to  be  filled  at  creation  time,  not  modifiable  but  can  be  filled 
later,  and  modifiable. 

We  feel  that  the  operation  procedures  should  be  used  to  provide  many  more 
facilities.  An  important  capability  is  triggered  operations  which  affect  other 
fields.  In  this  way.  for  instance,  a  user  does  not  have  to  specify  redundant 
entries.  They  will  be  entered  automatically.  A  second  important  capability 
which  should  be  added  in  operation  procedures  is  more  general  constraint 
checking.  There  are  many  interrelationships  between  the  attribute  values  which 
should  bo  cl'iOcked  fui‘  consistency.  Finally,  more  infoi’ination  should  be 
retained  by  the  operation  procedures  as  part  of  the  form  instance.  Specifically, 
it  would  be  nice  to  retain  the  subject's  signature  plus  the  station  num.ber  and 
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the  time  of  entry. 

The  third  restriction  in  onr  prototA’pe  system  deals  ynth  the  allowed  tem¬ 
plates.  Our  data  template  just  retains  the  values  of  the  attributes  plus  station 
numbers  during  attribute  entry.  The  text  template  is  very  simple  with  printed 
text  and  slots  for  attribute  values.  We  are  working  on  the  definition  and  imple¬ 
mentation  of  much  more  elaborate  text  templates.  This  area  is  important  in 
practice.  For  instance,  using  sophisticated  text  templates  we  can  personalize 
data  driven  computer  produced  documents. 

Our  current  version  of  the  system  does  not  support  voice  templates.  We  arc 
working,  however,  on  a  limited  capability  for  voice  templates.  In  this  way,  we 
can  incorporate  voice  messages  in  fonus.  Finally,  our  display  templates  are 
rather  unimaginative  having  a  direct  correspondence  Lo  the  text  template.  We 
are  working  on  much  more  sophisticated  templates  for  display  capabilities.  In 
this  way.  we  can  uncouple  the  form  manipulation  and  processing  from  the  pecu¬ 
liarities  of  the  output  defaces  and  complex  formatting  requirements  of  the  user. 

Our  prototype  system  provides  only  single-paged  forms.  Users  of  the  sys¬ 
tem  established  a  need  for  multi-paged  forms.  One  of  the  reasons  for  the  need 
for  multi-paged  forms  is  that  with  character  displays  one  cannot  fit  as  much  on 
the  screen  as  on  a  page  of  a  business  form.  Hence,  even  single-paged  business 
forms  did  not  fit  onto  one  of  our  "pages".  With  bit  map  displays,  however,  a  page 
is  a  page,  Kence,  the  iimitation  seems  to  be  more  on  our  displays  rather  than 
our  facilities.  We  still  feel  that  there  is  a  genuine  need  for  multi-paged  forms. 

Our  prototype  system  implements  the  special  attribute  as  a  triple  (form 
type  number,  unique  number,  copy  number).  The  form  type  number  uniquely 
determines  the  form  type  ■wi.thin  the  system.  ITie  unique  number  is  guaranteed 
to  be  unique  system  wide  for  that  type.  This  is  s.chieved  by  issuing  groups  of 


unique  numbers 


to  stations  from  the  control  node.  This  operation  resembles 
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giving  away  uniquely  identified  blank  paper  forms  in  paper  systems.  The  copy 
number  identifies  an  official  copy  of  the  form  and  it  is  unique  -within  a  form 
instance  by  having  strict  control  of  the  copying. 

We  are  ready  at  this  point  to  discuss  form  operations.  We  will  outline  opera¬ 
tions  as  they  apply  to  our  conceptual  framework  and  as  they  are  implemented  in 
the  prototype  system.  First,  we  will  discuss  form  operations  which  apply  to  indi¬ 
vidual  instances  of  forms.  This  facility  is  provided  for  office  workers  and  relates 
to  user  needs  to  handle  forms  as  messages  and  documents.  In  our  prototype 
system  this  facility  is  pro-vided  by  OFS  (Office  Form  System)  [Kornatowski  and 
Cheung  1980].  We  will  also  discuss  operations  on  the  data  of  the  forms.  This 
facility  enables  data  processing  users  to  treat  data  on  forms  as  part  of  a  data 
base.  The  facility  is  provided  in  our  prototype  system  by  MRS  (ificro  Relational 
iFystem)  [Kornato-wski  1979].  Wc  will  not  give  many  details  about  the  syntax  of 
operations  in  this  paper.  Details  about  the  user  interfaces  of  the  system  can  be 
found  in  the  users  manuals.  Both  OFS  and  MRS  provide  an  environment  for  the 
direct  manipulation  of  forms  or  data  on  forms  by  the  users.  Both  systems  are 
passive,  i.e.,  they  do  not  initiate  any  actions  b)'-  themselves.  Automatic  office 
procedures  will  be  discussed  separately. 

2.3.  Form  Operations 

Operations  on  form  instances  are  issued  from  stations.  Stations  can  be  per¬ 
sonal  computers,  or  processes  running  on  bigger  computers.  Stations  are 
uniquely  identified.  The  operations  apply  to  the  form  instance.  The  user  may 
issue  the  operation  while  viewdng  a  template  instance.  However,  the  operation  is 
issued  in  tenns  of  the  form  instance.  In  this  way,  we  avoid  the  difficulties  wdth 
mapping  an  unrestricted  template  instance  operation  into  a  form  instance 
operation.  For  instance,  an  operation  in  terms  of  text  or  voice  may  not  have  a 
direct  counterpart  in  a  form  operation.  This  implies  that  the  user  is  restricted 
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with  respect  tc  operations  on  template  instances.  Suppose  the  user  wants  to 
issue  free  text  operations  on  a  text  template  instance.  He  can  do  so  only  if  he 
obtains  the  text  template  instance  as  text  outside  the  form  system.  However, 
the  changes  he  makes  from  then  on  will  not  be  reflected  in  the  form  instance. 
Our  approach  is  related  to  the  way  view  updates  are  handled  in  relational  data 
base  systems.  The  operations  on  template  instances  can  onl3i"  go  through  if 
there  is  a  unique  way  to  map  them  to  modifications  of  the  underlying  form 
instances.  This  situation  implies  that  the  template  instance  operations  are 
sometimes  constrained. 

There  is  a  tradeoff  between  the  need  for  general  template  mappings  and  the 
ability  to  issue  operations  through  them.  If  a  template  is  very  complex  it  is  hard 
for  the  user  to  isolate  the  attributes  and  issue  operations  on  them  through  the 
template.  Very  elaborate  templates  are  very  appropriate  for  final  output  but 
not  during  processing  of  a  form  instance.  For  storage  and  processing  purposes 
it  is  much  simpler  to  view  form  instances  through  simple  templates.  For 
instance,  if  a  template  is  a  straightforward  "fill  in  the  slots”  type  of  template 
operations  can  be  issued  through  it.  The  s>’^tem  v.dll  translate  these  operations 
to  operations  on  the  form  instance.  The  user  is  able  under  these  circumstances 
to  merge  in  his  mind  the  template  instance  and  the  form  instance.  He  thinks 
that  the  operations  apply  on  the  forms  as  he  sees  them. 

The  first  form  operation  provided  is  entry  of  attribute  values  in  forms. 
Entering  a  value  does  not  necessarily  mean  that  the  old  value  of  the  attribute,  if 
present,  is  discarded.  The  attribute  value  entry  operation  can  be  defined  to 
retain  the  old  value  in  the  form  instance.  In  that  case  the  attribute  value  entry 
operation  is  augmented  "with  a  procedure  which  retains  the  old  values.  The  old 
values  present  in  the  form  instance  can  be  masked  out  later  on,  by  the  tem¬ 
plates.  By  retaining  a  historical  account  of  old  values  in  the  form  instance  we 
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can  mirror  some  paper  form  operations.  The  operation  can  be  anedogous  to 
crossing  out  and  rewriting  as  effected  on  paper  forms.  An  entry  procedure  can 
also  retain  person  signature  and  time  for  the  operation.  In  this  way,  attribute 
values  can  be  related  to  a  person  for  accountability  purposes. 

The  attribute  entry  operation  is  usually  done  through  a  simple  template. 
However,  values  can  be  entered  automatically  on  form  instances.  For  example, 
this  happens  if  -we  load  the  forms  with  values  as  they  appear  in  a  file.  Before 
entering  any  values  the  system  will  fill  the  unique  identifier  of  the  form  instance. 
Entry  procedures  associated  with  the  attributes  can  generate  side  effects  which 
will  enter  values  for  other  attributes.  If  the  procedures  are  not  specified,  the 
user  will  have  to  enter  all  the  values  and  perform  the  associated  calculations. 
One  of  the  important  advantages  of  the  automatic  side  effects  is  an  immediate 
response  to  the  user.  In  this  way  abnormal  situations  can  be  quickly  detected. 
This  capability  implies  the  availabilit}'-  of  additional  information  to  the  pro¬ 
cedures  specified  at  attribute  entry.  For  example,  in  an  inventory  application 
an  order  entry  procedure  can  check  inventory  information  about  availability  of 
items. 

There  is  a  small  editor  associated  with  fillmg  entries  to  allow  correction  of 
mistake.s.  The  system  guides  the  user  to  fill  all  the  necessary  fields  and  retains 
the  station  number.  Entry  of  values  for  some  attributes  can  be  delayed  until 
later.  An  attribute  can  also  be  declared  to  have  a  mandatory  value  before  the 
form  instance  is  stored.  In  the  passive  part  of  our  system  (OFS)  we  do  not  have 
entry  procedures  implemented  [Kornatowski  and  Cheung  19B0].  In  the  active 
part  of  our  system  (outlined  in  the  next  section)  we  can  generate  side  effects 
upon  value  entry. 

Entering  values  in  form  atti  ibutes  is  a  separate  operation  from  storing  a 
form.  The  user  may  want  r,o  look  at  a  form  before  storing  it  in  the  system.  A 
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foriu  is  stored  in  a  Gle  or  a  heap.  A  file  is  a  form  container  which  can  only 
receive  form  instances  of  the  same  type.  A  heap  can  receive  arbitrary  types  of 
form  instances.  Form  values  are  stored  separately  from  templates. 

A  form  outside  a  nie,  or  heap  does  not  have  official  status  as  a  form.  In  this 
way,  official  form  instances  are  distinguished  from  unofficial  entries  with  the 
same  information.  This  separation  is  pervasive  in  our  framework.  A  form  can  be 
copied  unofficially  by  retaining  the  same  data  that  appear  in  the  attributes. 
Such  a  copy  can  be  looked  at  as  a  record  in  a  file  and  does  not  have  any  official 
status  as  a  form.  The  system  cannot  prevent  a  user  from  doing  tiiis  operation 
provided  he  had  access  to  the  form.  However,  the  system  does  not  have  any 
obligation  to  keep  this  record  consistent,  or  to  treat  it  the  same  way  as  an 
official  form.  A  form  can  be  copied  into  another  form  of  the  same  type.  That  is 
a  form  can  be  filled  witn  the  same  information  and  stored  in  the  system.  A  user 
is  perfectly  capable  of  doing  this  operation,  however  the  new  form  instance  will 
have  a  different  identification  number.  In  this  way,  there  is  a  clear  difference 
between  the  original  and  a  copy.  Finally,  a  form  can  be  copied  officially.  This 
operation  can  be  done  by  the  control  node  operating  a  special  bonded  station. 
Such  officieii  copies  are  accounted  for  by  the  system,  i.e.,  the  system  knows  how 
many  they  are  and  wffiere  they  are. 

In  our  prototype  system,  copying  of  forms  is  strictly  controlled.  A  local 
copy  of  a  form  is  allowed  but  it  will  have  different  station  signatures  associated 
with  its  attributes  and  a  different  identifier.  If  the  user  specifies  that  he  wants 
the  same  data  to  be  put  in  a  number  of  copies  the  system  well  generate  a 
number  of  form  instances  which  have  the  same  unique  identifier  but  different 
copymeter  number.  In  this  way  all  the  data  and  associated  signatures  on  the 
form  copies  will  be  identical  except  the  copy  number. 

The  main  reason  for  putting  so  many  restrictions  and  so  much  emphasis  on 
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copies  is  to  control  the  desired  side  effects  on  them.  Unofficial  copies  are  not 
important  as  far  as  the  S3''stem  is  concerned.  The  system  is  not  obliged  to  keep 
them  consistent.  The  users  can  draw  information  from  them  knowing  that  the 
information  may  not  be  accurate.  Formal  copies,  on  the  other  hand,  should  stay 
consistent.  This  consistency  is  enforced  by  operation  procedures  associated 
with  entry  of  values  for  attributes.  The  euasisteiiey  can  be  specified  in  many 
ways.  For  instance,  the  procedures  may  specify  that  any  change  in  a  copy 
should  reflect  on  all  copies  and  the  original.  Another  example  of  consistency 
specification  is  to  define  procedures  which  reflect  every  change  of  the  original  to 
the  copies  but  not  vice  versa.  That  is,  copies  can  be  changed  independently  but 
they  at  least  inherit  every  change  of  the  original.  In  all  these  situations,  the  sys¬ 
tem  must  have  some  knowledge  of  how  many  official  copies  there  are  and  hope¬ 
fully  where  they  are.  This  is  achieved  by  controlling  carefully  the  copy  opera¬ 
tion. 

Modification  of  attribute  values  is  related  to  entry  of  attribute  values.  In 
our  framework  there  is  no  need  for  a  different  modification  operation.  Data 
entry  can  be  used  as  a  modification  if  the  attribute  had  a  value  before.  We  can 
specify  separate  modification  operations  which  are,  in  essence,  data  entry 
operation  which  erase  the  old  value  from  the  form  instance.  In  our  prototype 
system  modification  operations  are  specified  through  separate  commands  and 
they  have  the  usual  meaning.  Only  certain  attributes  which  are  declared  to  be 
modifiable  can  be  modified.  The  old  values  are  discarded  and  the  station  signa¬ 
ture  associated  with  the  new  value  is  kept. 

Deletion  operations  of  form  instances  have  to  be  very  carefully  controlled. 
Forms  are  official  entries  and  should  not  be  deleted  arbitrarily.  We  feel  that  the 
deletion  operation  should  be  augmented  with  automatie  proeedure.s  which  notify 
the  control  node  about  the  deletion  and  give  the  system  a  chance  to  archive  the 
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form  instance.  In  this  way  the  form  instance  will  disappear  from  the  user’s  view, 
but  it  will  not  disappear  from  the  system. 

In  our  prototype  system  there  is  no  deletion  operation.  Once  a  form 
instcince  has  been  created  it  cannot  be  disposed  of  SLrbitrariiy.  The  form  has  to 
be  sent  to  a  disposal  station  which  sers-'es  as  a  wastebasket.  The  disposal  station 
can  be  local  or  global.  This  station  can  either  shred  electronically,  or  archive 
the  form  in  question.  A  disposal  station  can  be  specLCed  wiiich  sepaiates 
ephemeral  from  important  form  instances.  The  important  form  instances  can 
generate  the  data  bases  needed  for  running  the  organization.  In  this  way  data 
entry  as  an  operation  is  substituted  by  siphoning  of  information  from  the  form 
system  over  to  the  operational  data  bases  of  the  organization. 

By  controlling  form  instance  deletion  we  can  impose  a  law  of  conservation 
of  forms.  Forms  originate  in  particular  stations.  Their  origin  is  known  and  con¬ 
trolled  through  assignment  of  unique  form  instance  numbers,  e.g.,  invoice  Jjt. 
Forms  terminate  only  in  disposal  stations.  There  is  no  other  way  for  a  form  to 
exit  the  system.  Kence  at  any  point  in  time  the  system  can  find  out  exactly  how 
many  form  instances  there  are  and,  using  the  locate  commands  as  we  shall  see 
later,  where  they  are. 

A  very  importemt  set  of  commands  deals  with  moving  forms  between  sta¬ 
tions  in  the  system.  These  operations  implement  an  electronic  mail  facility. 
The  procedures  assigning  side  effects  to  send  and  receive  operations  are  very 
important.  These  procedures  can  notify  the  control  station  about  movement  of 
forms.  In  this  way,  it  can  keep  a  running  account  of  where  all  forms  are,  both 
original  and  official  copies.  These  logs  provide  three  functions  in  the  system. 
First,  they  can  be  used  for  giving  an  overall  status  of  the  system  regarding 
bottlenecks  in  communication  or  overloading  of  stations.  Second,  they  can  be 
used  for  locating  forms  in  stations.  Third,  they  can  be  used  for  recovery  when  a 
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station  malfunctions.  They  provide  information  of  what  official  forms  were 
present  in  the  stations.  All  this  information  is  very  critical  and  sensitive  regard¬ 
ing  the  operation  of  the  overall  system.  Access  to  it  should  be  carefully  con¬ 
trolled.  In  addition,  the  users  should  be  aware  and  agreeable  to  this  collection 
of  information.  The  capability  of  retaining  knowledge  about  the  operations  of 
the  system  can  be  threatening  to  individuals.  It  may  reflect  on  performance  of 
people  in  their  jobs.  It  should  be  present  only  if  the  persons  using  the  system 
consider  these  facilities  to  offer  advantages  in  their  work  and  not  threaten  their 
existence,  or  investigate  their  work  habits. 

In  our  prototype  system  form  instances  are  passed  around  using  send  and 
receive  commands.  Only  the  values  are  transmitted.  The  templates  are  avail¬ 
able  separately.  This  approach  enables  a  station  to  send  many  forms  without 
sending  always  their  templates.  Forms  are  deposited  in  mail  trays.  Each  sta¬ 
tion  owns  a  set  of  mail  trays.  Other  stations  can  deposit  forms  in  the  mail  trays. 
Only  the  owner  station  can  pick  forms  from  its  own  mail  trays.  There  is  very  lit¬ 
tle  mterference  between  the  act  ons  on  the  mail  trays.  In  this  way,  there  is  no 
need  for  complicated  locking  and  synchronization  facilities.  In  our  prototype 
system  all  mail  trays  reside  in  the  control  node.  In  such  an  environment  mail 
can  be  sent  to  a  station  even  if  it  is  completely  dead.  It  takes  an  action  from  the 
user  to  receive  his  mail.  The  user  decides  both  the  forms  he  wants  to  receive 
and  the  time  he  wants  to  receive  them.  In  the  next  section  we  will  discuss 
automatic  procedures  which  can  be  specified  to  receive  mail  automatically  on 
behalf  of  a  station  from  the  control  node. 

Ail  mailed  forms  in  our  system  pass  through  a  control  node.  It  can  be 
argued  that  this  creates  an  artificial  bottleneck.  There  are,  however,  some 
advantages  to  this  approach.  First,  it  is  very  ca.sy  to  handle  the  problem  of 
disconnected  stations.  The  forms  stay  in  the  control  node.  Second,  a  log  is  kept 
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in  the  control  node  about  the  rnuverneiiL  of  forms.  This  log  enables  the  system 
to  locate  and  trace  forms.  These  are  privileged  commands  in  the  system  which 
can  provide  the  exact  location  where  a  form  is.  The  system  knows  where  a  form 
was  last  sent  from  the  log.  There  is  also  no  capability  for  deletion  of  a  form 
instance  within  a  station.  A  trace  command  provides  a  complete  path  that  a 
form  took  from  its  originating  station  up  to  where  it  is  at  the  moment.  The  only 
forms  which  cannot  be  traced  are  private  form  instances  which  originate  In  a 
station  and  are  kept  there.  Even  in  that  case  the  system  knows  that  a  form 
Ins  Lance  was  created.  A  unique  Identifier  was  assigned  and  used  in  that  particu¬ 
lar  station. 

A  group  of  commands  deals  with  finding  form  instances.  Once  a  form 
instance  is  found  we  are  not  restricted  to  viewing  it  according  to  a  single  tem¬ 
plate.  The  different  templates  give  us  many  exciting  alternatives.  As  a  matter 
of  fact,  we  can  issue  an  operation  while  viewing  forms  in  one  template  and  get 
the  results  in  another  template.  For  instance,  we  can  query  through  the  display 
template  and  get  results  in  a  voice  message.  Appropriate  keywords  in  the 
access  operation  like  \dew,  print,  read  and  say  can  dirt  ct  the  system  to  use  a 
pai  Liculai’  template. 

In  our  prototype  system  forms  cam  be  accessed  sequentially.  They  cam  also 
be  accessed  using  some  Boolean  expression  of  conditions  on  attribute  values. 
For  instance,  we  can  ask  for  the  next  form  in  a  file,  or  the  (first)  form  with  Cus¬ 
tomer  name  Smith  in  a  file.  Ail  access  commands  deal  with  one  form  at  a  time. 
If  a  user  wants  to  access  many  forms  together  he  can  connect  them  into  a  dos¬ 
sier.  A  dossier  is  a  logical  structure  v/hich  threads  together  a  number  of 
semantically  close  forms,  e.g.,  all  invoices  of  a  particular  item. 


All  form  commands  in  our  prototype  system  deal  with  local  files  or  heaps.  A 


user  cannot  operate  on  forms  inside  the  clcctronie  drav.'err.  of  somebody  cLse': 
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desk.  This  constraint  eliminates  interference  and  permits  an  independent 
operation  for  each  station.  To  operate  on  a  form  in  a  foreign  station  a  user  has 
the  following  alternatives.  First,  he  can  walk  to  that  station  and,  provided  he  is 
authorized  to  use  the  station,  he  issues  an  operation  on  the  form.  Second,  he 
can  ask  the  other  station  to  send  the  form  to  his  station  for  the  operation. 
Third,  he  can  ask  the  other  station  to  do  the  operation  for  him.  In  all  scenarios 
the  control  rests  with  the  foreign  station. 

Data  on  the  forms  are  specially  treated  in  our  framework.  Some  users  do 
not  want  to  see  and  operate  on  only  one  form  at  a  time.  Thej?^  would  rather  sum¬ 
marize  data  of  several  forms  and  operate  on  data  wdiich  are  distributed  on  many 
similar  forms.  This  requirement  implies  that  data  on  forms  are  available  for  data 
base  operations.  The  data  base  commands  should  be  allowed  directly  on  the 
data  of  the  forms  without  any  explicit  loading  into  a  separate  data  base  system. 
The  office  workers  manipulate  the  forms  during  the  day  to  day  operations  of  the 
office.  The  availability  of  data  base  operations  enables  office  managers  to  view 
the  whole  operation  and  obtain  management  reports  through  further  proeess- 
ixig. 

In  our  system  the  data  base  facility  is  provided  by  treating  a  data  template 
of  a  form  type  as  a  relation  definition.  The  data  template  retains  only  the  values 
of  the  attributes  (the  form)  and  the  signatures  of  the  stations  where  the  values 
have  entered.  All  form  instances  of  each  type  local  to  a  station  can  be  accessed 
using  relational  commands.  The  operations  are  similar  to  SQL  [Astrahan  et  al 
1976]  and  are  provided  by  a  relational  data  base  system  called  MRS  [Kudyma  et 
ai  1981,  Kornatowski  1979].  There  is  no  loading  operation  between  data  in  a 
form  instance  into  data  as  part  of  the  data  base.  The  same  data  are  viewed  as 
messages  through,  for  instance,  a  text  template  and  as  data  instances.  This 
facility  presupposes  a  complete  integration  of  facilities  at  a  certain  level  of  the 
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system.  In  our  case,  for  instance,  files  and  indexes  are  shared  by  the  OFS  sys¬ 
tem  which  manipulates  the  data  as  forms  and  the  MRS  system  which  manipu¬ 
lates  the  data  as  part  of  a  data  base.  In  fact,  a  form  as  it  appears  on  the  screen 
is  never  stored.  The  values  and  the  templates  are  stored  separately.  The  sys¬ 
tem  merges  them  specifically  for  user  viewing. 

At  this  point  it  is  appropriate  to  comment  on  the  desirability  of  associating 
constraints  on  the  data  and  the  operations  on  them  by  augmenting  the  standard 
operations  on  a  form  instance.  The  data  has  a  multifaceted  role  due  to  the 
existence  of  many  templates.  The  operations  issued  through  the  templates 
reflect  more  the  nature  of  the  template.  Hence,  the  operations  through  the 
template  cannot  naturally  be  constrained  writh  inherent  restrictions  in  their 
meaning.  For  instance,  the  update  facility  of  the  data  base  operations  does  not 
have  the  same  controls  as  the  modification  operations  of  a  form  system.  How¬ 
ever,  both  of  them  are  augmented  by  the  same  procedures  as  specified  in  the 
form  type.  In  this  way,  the  same  constraints  on  operations  can  be  enforced  no 
matter  which  template  they  are  issiied  from.  The  restrictions  are  not  put  on  the 
syntax  of  operations  associated  with  a  particular  template.  They  are  part  of  the 
definition  of  the  form  type  as  provided  by  the  operation  procedures. 

To  illustrate  the  use  of  data  base  commands  consider  an  example  of  order 
entrj?^  billing-  Suppose  that  a  copy  of  the  form  instance  regarding  a  sale  is 
retained  in  the  accounting  office  station.  There  is  often  a  need  to  obtain  some 
sales  statistics.  Suppose,  for  exaunple,  that  a  manager  wants  to  get  a  list  of  cus¬ 
tomers  which  have  outstanding  bills.  He  can  obtain  this  list  by  issuing  a  data 
base  command  directly  on  the  data  of  the  form  instances. 

SELECT  Customer  name  FROM  Invoice  WHERE  Status=  payable 


In  our  system  there  is  a  dichotomy  between  the  form  operations  and  the 
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data  base  operations.  Data  base  operations  follow  a  traditional  approach,  i.e., 
relational  query  language.  This  situation  came  about  historically.  The  relational 
system  MRS  has  been  developed  before  OFS  and  we  decided  to  make  use  of  it.  A 
better  approach  would  have  been  to  have  a  form’s  approach  to  data  base  query. 
That  is,  to  formulate  data  base  queries  by  filling  forms.  Such  an  approach  has 
appeared  in  the  literature  [Zloof  1980,  Luo  and  Yao  1981].  By  having  a  form’s 
oriented  data  base  language  we  can  achieve  a  completely  uniform  interface. 
Form  operations  are  in  terms  of  forms,  data  base  operations  arc  in  terms  of 
forms.  Finally,  as  will  be  seen  in  the  next  section,  form  procedures  are  in  terms 
of  forms.  Rather  than  change  our  query  language  we  are  concentrating  instead 
on  a  distributed  data  base  capability. 

All  form  operations,  including  data  base  operations,  operate  on  forms  local 
to  the  station.  However,  we  feel  that  there  are  occasions  that  a  privileged  user- 
should  be  allowed  to  query  data  which  are  distributed  in  many  stations.  For 
instance,  suppose  that  each  salesman  has  a  local  station  where  he  keeps  orders. 
Maybe  the  sales  manager  should  be  allowed  to  ask  'Kow  are  sales  doing?  '.  There 
is  only  one  way  that  this  operation  can  be  effected  -with  the  facilities  described 
up  to  now.  The  sales  manager  has  to  send  a  message  to  all  salesmien  requesting 
them  to  report  their  progress.  It  w'^oxild  be  nice  if  the  system  can  handle  directly 
such  distributed  queries.  In  this  facility  a  global  query  can  be  issued  from  a  sta¬ 
tion.  A  global  query  has  a  scope  which  is  defined  as  a  group  of  stations  on  which 
it  is  served.  This  query  translates  into  a  series  of  local  queries  on  local  stations. 
Adi  of  these  local  queries  are  oi’chestrated  in  such  a  'V3.y  that  each  form  instance 
IS  accounteU  in  the  query  exactly  once.  The  system,  is  not  necessarily  frozen 
during  the  oj^eration  of  the  global  query.  Forms  continue  to  flow  through  the 
system  while  the  global  query  is  executed. 

A  distributed  query  capaciiity  is  very  important  in  an  office  environment.  A 
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very  eflective  way  to  communicate  is  to  allow  other  persons  to  view  common 
data.  This  mode  is  especially  aLLiu.cLive  when  we  want  to  broadcast  information. 
This  is  the  basic  approach  used  in  data,  base  m-anagernent  to  pfTect  cornmunica- 
tion  by  posting  data  in  a  common  data  base.  A  form  systi'rii  which  has  only  a 
message  capability  lacks  this  significant  mode  of  nommiinication.  We  can  make 
the  assumption  that  the  office  system  is  augmented  by  a  central  data  base  sys¬ 
tem  which  provides  broadcast  type  of  communication.  How'-ever,  in  this  case  we 
have  two  kinds  of  information,  L.e.,  information  in  the  office  S3’’stem  and  Informa- 
Liuu  in  the  data  base  management  system.  To  obtain  integration  we  should  allow 
the  user  to  view  the  data  in  the  oflice  stations  as  a  global  data  base.  This  implies 
the  necessity  of  a  distributed  query  capability  on  a  more  global  scale. 

We  have  designed  and  are  implementing  global  nneries  in  our  environment 
[Rabitti  1951].  The  restrictions  placed  in  our  prototype  system  maices  the 
implementation  simpler.  First,  the  forms  ai'e  not  generatea  and  deleted  arbi¬ 
trarily.  For  that  reason  forms  will  not  be  missed  because  they  were  deleted  dur¬ 
ing  the  time  of  servicing  the  global  query.  Second,  the  mail  goes  always  through 
the  control  node,  in  this  way  the  control  node  can  orchestrate  the  local  queries 
without  extensive  locking.  Third,  ecich  form  instance  has  exactly  one 
identifiable,  official,  original  copy.  These  restrictions  siiiipLiry  many  complicaLed 
problems  of  multiple  copies,  consistency,  synchronization  and  locking.  This 
operation  provides  a  very  limited  case  of  distributed  data  base  processing.  Only 
global  queries  are  allowed.  Global  modifications  are  not  allowed.  In  addition, 
the  presence  of  the  control  node  with  all  the  information  it  retains  simplifies 
implementation. 


3.  h  ORM  PROCJihruKhfc; 
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3.1.  Environment 

The  form  operations  outlined  in  the  previous  sections  are  initiated  directly 
by  the  user.  There  are  many  situations,  however,  where  we  want  to  specify 
actions  to  be  performed  automatically  by  the  system  under  certain  conditions. 
There  are  many  different  terms  used  for  such  actions,  e.g..  agents,  data  base 
procedures,  office  procedures,  etc.  In  effect,  we  want  a  specification  ability  for 
generalized  procedures.  These  procedures  are  initiated  "when  certain  precondi¬ 
tions  are  met,  perform  some  actions  and  test  some  postconditions.  If  the 
postconditions  are  met  they  terminate  with  a  set  of  '‘success’'  oper  ations.  If  the 
postconditions  are  not  met  they  terminate  with  a  set  of  "failure"  operations. 
Many  different  specification  languages  have  been  suggested  providing  office 
oriented  procedures  [Zisman  1977,  Mylopoulos  and  Wong  1980,  De  Jong  1930, 
Zloof  1930]. 

There  are  mainly  tw^o  design  choices  for  a  facility  for  office  procedure 
specifications.  First,  -we  need  to  decide  what  capabilities  to  provide  in  the 
specification.  Second  wc  need  to  decide  on  the  way  of  presenting  this  facility  to 
the  user.  The  generality  of  the  specification  is  closely  related  to  its  goal.  If  it  is 
mainly  a  requirements  specification  facility  without  plans  for  implementation,  it 
ean  be  very  general  and  powerful,  e.g..  OSL  [Hammer  and  Kuala  1980].  If  aa 
implementation  is  desirable  then  some  of  the  generality  needs  to  be  sacrificed. 


For  example,  the  specification  language  used  in  SCOOP  is  less  general  but  it  has 
been  implemented  [Zisman  1977]. 

There  is  also  a  choice  of  implementation  environment.  If  the  facility  is 
implemented  in  LISP  or  some  other  powerful  Artificial  Intelligence  tool,  then  a 
powerful  specification  environment  can  be  put  together  with  a  reasonable  effort 
[Attardi  et  al  1980].  The  problem  however,  of  such  an  approach  Is  to  achieve  an 
acceptable  level  of  performance.  If  the  facility  is  implemented  in  a  regular 
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software  environment,  then  the  Implementation  ef^crt  is  c onsicier ahle .  A.^  a 

result,  the  facility  is  rather  limited,  but  the  performance  is  acceptable. 

The  second  design  choice  relates  to  the  user  environment.  If  the 
specification  facility  is  used  by  programmers,  then  it  can  resemble  a  program¬ 
ming  language.  If  the  specifications  facility  is  mainly  geared  for  office  workers 
•with  minimum  programming  expertise,  then  it  should  incorporate  a  very  simple 
user  interface. 


In  our  en'vironment  we  chose  to  implement  a  limited  facility  using  UNIX.  We 
also  hope  that  our  facility  can  be  used  by  nonprogrammers.  For  this  reason  it 
presents  to  the  user  a  simple  interface.  In  this  environment  office  workers  can 
specify  automatic  oj3ice  procedures  based  on  form.s.  In  the  sequel  we  will  con¬ 
centrate  on  the  functions  and  implementation  of  our  pi  ototype  system. 


3.2.  Interface 

The  specification  of  an  automatic  procedure  in  our  prototype  system  is  pro- 
■vided  by  TLA  (Toronto  Aatest  i4cronym)  [Hogg  1981,  Niertrazs  1981].  TLA  bears 
resemblence  to  SBA  and  OBE  [De  Jong  1980,  Zloof  1980].  The  precondition  seg¬ 
ment  of  a  procedure  is  like  a  QBE  query  with  forms  instead  of  tables  as  the  data 
objects.  Preconditions  in  TLA  describe  what,  when  and  from  where.  For  each 
procedure  preconditions  define  a  Tjuorking  set  of  forms.  The  ^working  set  may 
include  forms  that  come  only  from  certain  stations,  forms  local  to  the  station 
specifying  the  procedure,  or  forms  that  have  just  been  processed  by  another 
automatic  procedure.  We  may  also  specify  that  a  procedure  is  to  run  only  at 
certain  times.  The  appearance  of  a  value  in  a  field  of  a  precondition  indicates 
that  the  value  is  to  be  matched.  The  action  segment  is  similarly  related  to 
forms.  The  appearance  of  a  value  in  an  action  indicates  that  the  value  is  to  be 
inserted  in  the  appropriate  field.  The  order  in  which  forms  needed  by  a  pro¬ 
cedure  arrive  is  not  important.  The  order  in  which  actions  are  performed  is  not 
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expects  all  the  forms  in  a  procedure’s  working  set  to  be  linked  by  certain  reia- 
tionshipr,,  of  their  attribute  values.  Foi  exaple,  matching  attribute  values  can 
model  many  applications  of  automatic  procedures.  Each  sketch  in  a  procedure 
has  a  name  assigned  by  the  user.  This  name  is  appended  to  the  field  name.  In 
this  way  a  field  of  a  different  sketch  can  be  referenced  within  a  sketch.  Using 
this  feature  we  can  link  different  sketches.  For  example,  in  Figure  3.2  a  precon¬ 
dition  sketch  matches  an  order  with  an  inventory  form.  Notice  that  the  linking 
conditions  can  appear  in  either  sketch. 

We  can  also  restrict  the  source  of  mail  being  processed  by  an  automatic 
procedure.  This  is  effected  using  an  origin  pseudo-form  sketch.  Forms  may 
thus  be  processed  differently  depending  on  their  point  of  origin. 

All  form  modification  actions  are  indicated  on  form  sketches.  Every  form 
manipulated  by  a  form  procedure  usually  has  a  precondition  sketch  and  an 
action  sketch.  Actions  which  do  not  concern  themselves  with  attribute  values 
must  be  expressed  via  pseudo-forms. 

The  action  form  sketch  indicates  all  insertions  and  updates  to  the  form. 
The  values  to  be  inserted  may  be  constant  values,  e.g.,  an  authori7ation,  copied 
attribute  values,  or  possibly  function  calls  to  application  programs.  We  distin¬ 
guish,  therefore,  between  the  original  and  the  updated  value  of  any  attribute. 
An  attribute  which  must  be  copied  to  another  form  may  itself  be  modified,  and 
the  wrong  value  must  not  be  used.  Furthermore,  the  function  calls  may  access 
both  the  original  and  updated  values  of  attributes.  In  fact,  the  original  value  of 
an  attribute  svill  often  be  one  of  the  arguments  tu  a  function  call  update  to  tha.t 
attribute. 

The  action  sketch  of  Figure  3.3  illustrates  several  features.  The  price  of  an 
item  is  filled  in  by  copying  it  from  an  inventory  form.  A  program  called  "mult”  is 
called  to  calculate  the  total.  Finally,  the  original  value  of  Quantity  is  accessed 
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specified  in  detail.  TLA  merely  ensures  that  the  procedure  be  logically  con¬ 
sistent.  The  specification  is  non-procedural.  The  user  indicates  what  forms  are 
to  be  collected,  and  what  is  to  be  done  with  them.  Ke  does  not  specify  bow  they 
are  to  be  collected  or  how  the  actions  are  to  be  performed. 


A  TLA  procedure  is  a  collection  of  "sketches".  A  sketch  resembles  a  form, 
but  is  to  be  distinguished  from  form  tj^^es  or  form  instances.  A  form  precondi¬ 
tion  sketch  indicates  a  request  to  the  system  to  find  "a  form  that  looks  like  this". 
An  action  sketch  indicates  a  request  to  modify  a  form  that  has  already  been 
obtained.  A  form  sketch  describes  in  either  case  a  form  instance  before  or  after 
processing  by  the  procedure.  The  medium  of  a  sketch  specification  is  the  same 
form  template  of  the  form  instance  being  described.  Actions  aind  preconditions 
which  do  not  refer  to  information  found  on  the  face  of  a  form  are  specified  by 
sketches  of  "pseudo-forms".  For  example,  the  condition  that  a  procedure  pro¬ 
cess  only  forms  coming  from  user  "John"  mri.'?!  be  indicated  on  a  special  "source 
sketch". 

Form  sketches  are  used  to  capture  the  restrictions  referring  to  values  that 
appear  in  the  forms  in  the  working  set.  Local  restrictions  are  constant  attribute 
values,  sets  or  ranges  of  values,  and  relations  between  values  of  the  attributes 
on  a  given  form.  The  local  restrictions  refer  only  to  the  values  appearing  on  the 
face  of  a  single  form  in  the  working  set.  TLA  tries  to  determine  whether  a  given 
form  satisfies  the  local  restrictions  (including  the  source  condition)  for  some 
sketch  in  some  automatic  procedure.  If  it  does,  TLA  notes  that  condition  and 
attempts  to  inaLch  that  form  with  other  forms  to  obtam  a  complete  working  set 
for  that  procedure.  For  example,  in  Figure  3,1  a  precondition  sketch  instructs 
TLA  to  watch  for  order  form-S  requesting  Rorsalino  hats. 


Global  restrictions  on  the  wor-king  set  of  am  automatic  piocedure  are  the 
join  conditions  between  values  of  attributes  appearing  on  different  forms.  One 
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whereas  the  u.pdatf’d  valwe  of  Price  is  used.  Note  that  the  symbols  and 

"!"  are  used  to  respectively  access  functions,  original  and  updated  field  values. 
If  none  of  these  symbols  are  used,  a  constant  string  value  is  inserted. 


Follow-up  actions  performed  aiter  all  forms  are  modified  include  copying  of 
forms,  attaching  forms  to  dossiers  and  shipping  forms  to  other  workstations. 
Each  of  these  is  expressed  on  a  pseudo-form  sketch.  A  weak  sort  of  postcondi¬ 
tion  is  available  by  employing  a  function  call  to  decide  the  shipping  destination, 
the  number  of  copies  to  be  made,  and  so  on.  General  postconditions  can  only  be 
achieved  by  cooperating  forms  procedures  which  accept  different  cases  of  the 
working  set  of  forms. 

Suppose,  for  example,  that  the  processing  of  an  order  causes  the  quantity 
of  an  item  in  stock  to  decrease  below  a  certain  acceptable  level.  We  may  wish, 
at  this  point,  to  send  a  memo  to  the  manager  initiating  an  increase  in  the  pro¬ 
duction  of  the  item.  The  procedure  which  processes  orders  in  TLA  is  incapable 
of  conditionally  producing  this  memo  as  a  postcondition  to  inventory  update.  It 
could  unconditionally  produce  such  a  memo  and  then  functionally  decide  to  mail 
it  either  to  the  manager  or  to  a  garbage  collection  station.  A  cleaner  approach, 
though,  is  to  have  a  separate  procedure  which  searches  for  low  inv'cntory  items, 


diiu.  ttian  sends  the  riienio. 


With  this  approach  individual  tasks  are  clearly  identified.  Automatic  pro¬ 
cedures  ai  e  simple  and  completely  devoid  of  any  control  flow.  Furthermore,  the 
implementation  is  simpler  becau'se  postconditions  correspond  to  separate  pro¬ 
cedures.  The  low  inventory  checker,  for  example,  is  only  invoked  when  an  inven¬ 
tory  form  is  updated. 


3.3.  Implementation 


An  automatic  form  procedure  in  TLA  is  specified  by  a  collection  of  sketches 
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and  as  such  describe  what  is  to  be  done  rather  than  how  to  do  it.  The  sketch 
representation  is  convenient  for  the  user.  This  format,  however,  is  not  useful  for 
implementation.  The  specification  must  be  mapped  into  a  structure  which  is 
helpful  at  run  time.  In  TLA  all  local  constraints  of  forms  of  the  same  type  are 
extracted  from  all  procedures  and  stored  in  a  common  file.  This  file  is  used  to 
check  all  local  constraints  for  a  given  form  for  all  procedures  that  it  may  parti¬ 
cipate. 

We  cannot  predict  when  the  forms  required  to  trigger  a  form  procedure 
may  arrive.  The  processing  must,  therefore,  of  necessity  be  broken  into  distinct 
parts.  Suppose  that  TLA  is  notified  of  the  availability  of  a  form  for  automatic 
processing,  it  first  checks  whether  the  form  matches  the  local  conditions  of  any 
precondition  sketch  for  that  form  type.  The  local  conditions  are  compri.sed  of 
the  source  restriction  and  the  field  constraints.  Suppose  that  a  form  does 
match  the  local  constraints  of  one  or  more  precondition  sketches.  That  form  is 
then  a  candidate  for  a  working  set  for  some  procedure(s).  It  is  immaterial 
whether  or  not  a  working  set  including  that  form  is  complete.  There  is  always 
the  possibility  that  later  on  the  missing  forms  of  the  working  set  would  arrive. 
TLA  records  the  match  and  waits. 

Generally  forms  will  not  arrive  simultaneously.  Even  if  forms  arrive 
together,  the  processing  of  the  forms  is  sequential.  TLA  treats  each  form  indivi¬ 
dually.  A  locking  algorithm  guarantees  that  forms  cannot  be  processed  con¬ 
currently  at  a  given  workstation. 

After  the  local  constraints  have  been  matched  for  a  form,  TLA  checks  link 
conditions  between  the  corresponding  sketches  of  the  procedure.  The  link  con¬ 
ditions  are  stored  in  files  by  procedure.  TLA  will  wait  and  assemble  forms  until 
all  the  linking  conditions  are  satisfied.  Actions  are  performed  only  once  a  work¬ 
ing  set  of  forms  has  been  compiled.  Actions  are  stored  in  a  separate  file.  T]i/\ 
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preprocesses  procedures  to  check  the  legality  of  actions  and  to  determine  a 
legal  order  of  execution  if  one  exist.  No  further  run  time  analysis  is  performed. 
Actions  run  to  completion. 

The  TTjA  automatic  procedure  interpreter  which  assembles  the  working  set 
of  forms  is  triggered  upon  receipt  of  mail,  form  creation  and  form  modification. 
Since  the  last  two  are  initiated  by  the  user,  triggering  in  these  cases  involves 
only  the  spawning  of  a  new  process.  In  the  first  case,  however,  the  inLerpreliiig 
process  is  initiated  by  the  user  who  sent  the  mail.  Mail  may  arrive  while  the 
interpreter  is  running.  It  therefore  continues  to  process  all  mail  until  it  discov¬ 
ers  an  empty  mail  tray. 

Automatic  procedures  are  meant  to  run  regardless  of  whether  the  user  to 
whom  the  corresponding  station  belongs  ever  signs  on  after  the  procedure  is 
written.  Mail  in  our  system  is  routed  through  a  control  node.  The  sending  sta¬ 
tion  sends  a  message  to  the  control  node  consisting  of  the  contents  of  the  form 
and  the  name  of  the  station  which  is  to  receive  the  mail.  The  control  node  then 
stores  the  form,  updates  the  receiving  station’s  mail  tray  and  sends  a  message 
to  the  recipient’s  station.  At  the  receipient’s  station  machine,  the  TLA  inter¬ 
preting  process  is  started.  It  communicates  with  the  control  node  asking  for 
images  of  each  new  form  in  the  recipient’s  mail  tray.  The  interpreter  maintains 
files  of  form  images  for  each  form  available  for  automatic  processing.  It  deletes 
the  im.agcs  when  the  forms  have  been  processed  either  automatically  or  by  the 
user.  The  images  are  copies  of  the  contents  of  each  form  for  use  by  the  inter¬ 
preter  alone.  Images  are  stored  just  as  forms  are  stored.  The  user,  however, 
has  no  access  to  the  images  as  forms.  They  may  not  be  modified,  shipped  away, 
or  otborwisf^  manipnlat.eri.  They  are  not  properly  forms  or  copies  of  forma,  but 
merely  images  of  forms. 

The  working  set  of  a  form  nrocerlnre  is  abstracted  in  terms  of  a  sketch 
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graph  with  the  sketches  as  coloured  nodes,  and  the  links  of  matching  conditions 
as  edges  in  the  graph.  The  form  gathering  algorithm  must  find  corresponding 
forms  and  sati5?fy  matching  conditions  of  the  sketch  graph.  An  instance  graph  is 
generated  associated  with  the  forms  prc«:ent.  The  interpreter  tries  to  match  the 
sketch  graph  in  the  instance  graph.  The  instance  graph  may  look  different  from 
the  sketch  graph,  and  the  correspondence  must  be  established  carefully. 


When  a  form  is  passed  to  the  interpreter,  it  first  reads  the  file  of  local  con¬ 
straints  for  the  forms  of  that  type.  Whenever  a  match  is  found,  the  interpreter 
notes  which  sketch  of  which  procedure  is  matched  by  the  form,  and  it  enters  a 
tuple  consisting  of  the  form  type,  the  form  key,  the  procedure  and  the  sketch 
matched  into  a  relation  called  "NODE”.  The  file  of  global  constraints  for  the  pro¬ 
cedure  matched  is  then  read.  For  every  link  concerning  the  matched  sketch, 
TLA  establishes  w'hether  the  current  form  satisfies  the  join  conditions  with  any  of 
the  forms  previously  recorded  in  the  NODE  relation.  For  every  new  link  found. 
TLA  inserts  a  tuple  into  another  relation  called  EDGE.  EDGE  records  the  form 
keys,  types,  sketch  names  and  procedure  name  of  every  link  established. 

The  NODE  and  EDGE  relations  describe  the  instance  graph  with  forms  as 
nodes  and  links  between  them  as  edges.  The  nodes  are  coloured  according  to 
which  sketch  the  form  matches.  If  a  form  matches  two  or  more  distinct 
sketches  in  one  or  more  procedures,  it  is  multiply  represented,  once  for  each 
sketch.  Procedure  names  partition  the  instance  graph,  since  there  can  be  no 
links  between  sketches  of  different  procedures.  For  each  partition  we  wish  to 
match  the  sketch  graph  that  describes  the  working  set  of  forms  for  that  pro¬ 
cedure.  Nodes  are  assigned  a  unique  colour  for  each  sketch,  and  the 
corresponding  colours  are  used  in  the  instance  graph.  An  instance  of  the  sketch 
graph,  then,  must  be  found  within  the  instance  graph. 


The  relationships  between  the  forms  in  the  working  set  of  a  procedure  are 
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usually  expressed  in  terms  of  the  link  conditions.  The  sketch  graph  will  gen¬ 
erally  be  cormected.  The  instance  graph,  however,  will  more  often  consist  of 
several  partially  complete  working  sets  of  forms,  and  so  will  usually  be  discon¬ 
nected. 

If  the  join  conditions  imposed  on  the  working  set  of  forms  are  ’’nice"  then 
each  connected  subgraph  of  the  instance  graph  will  2l1so  be  a  subgraph  of  the 
sketch  graph.  It  is  conceivable,  however,  that  two  forms  satisfying  a  precondi¬ 
tion  sketch  may  each  satisfy  a  link  condition  with  a  third  form  satisfying  a 
second  sketch  in  the  same  procedure.  This  anomally  will  occur  either  if  the 
imposed  join  conditons  are  "not  nice  enough",  or  if  duplicate  forms  are  inadver¬ 
tently  created  and  passed  through  the  system.  In  this  case,  the  connected  sub¬ 
graphs  of  the  instance  graph  are  not  as  simply  related  to  the  sketch  graph. 
Thus,  establishing  when  a  complete  workings  set  of  forms  has  been  compiled 
requires  careful  analysis. 

If  a  copy  of  the  sketch  graph  is  identified  in  the  instance  graph,  then  a 
•working  set  has  been  found.  The  form  procedure  is  then  executed,  and  the 
corresponding  nodes  and  edges  are  purged  from  the  instance  graph.  No  more 
working  sets  remain.  When  a  new  form  arrives,  a  working  set  of  forms  may  be 
completed  only  if  that  new  form  is  included.  The  analysis  of  the  instance  graph, 
then,  need  only  concern  the  connected  subgraphs  which  include  nodes 
representing  the  new  form. 

Link  conditions  gmng  rise  to  sketch  trees  seem  natural,  since  a  "nice" 
description  of  the  relationships  between  sketches  would  contain  no  cycles.  If  A 
is  related  to  B  and  B  is  related  to  C,  then  one  would  hope  not  to  find  any  other 
relationship  hoidng  between  A  and  C.  in  practice,  ho'wever,  things  may  not  be 
that  simple.  Link  conditions  might  give  rise  to  cycles,  or  even  disconnected 
sketch  graphs. 
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The  algorithm  ■which  searches  the  instance  graph  for  a  copy  of  the  sketch 
graph  employs  a  list  of  potential  'working  sets.  Initially,  there  exists  a  single 
such  set  containing  only  the  key  of  the  ne-wly  added  form.  Edges  are  traversed 
in  the  instance  graph  and  keys  are  added  at  each  set  until  all  the  edges  and 
nodes  in  the  sketch  graph  have  been  checked  [Nierstrasz  1981]. 

The  station's  ov/ner  may  attempt  to  move  some  of  the  forms  in  the  working 
set  ■while  the  interpreter  is  running.  Each  of  the  forms  must  therefore  be  set 
aside.  If  any  of  the  forms  cannot  be  found,  then  the  interpreter  restores  all  the 
forms  retained  thus  far,  and  does  not  initiate  the  form  procedure. 

Partly  completed  working  sets  of  forms  may  or  may  not  have  a  particular 
meaning  in  Lerins  of  exceptions  and  errors,  if  forms  are  "missing"  from  a  ■work¬ 
ing  set,  the  present  forms  may  also  be  part  of  another  working  set.  Tlie  missing 
forms  would  determine  which  procedure  is  to  be  activated.  There  is  no  way  of 
telling  which  procedure  forms  are  missing  until  they  arrive.  Missing  forms  may 
never  arrive.  There  is  no  way  of  interpreting  their  absence  as  an  error,  except 
by  placing  some  arbitrary  time  limit  upon  form  gathering. 

Forms  may  satisfy  partly  completed  vrorking  sets  for  a  number  of  pro¬ 
cedures.  There  is  a  need  for  some  convenient  way  of  displaying  these  sets. 
Users  could  interpret  what  is  "missing"  and  possibly  act  on  this  information. 
Instance  graphs  can  be  quite  complicated.  Several  partly  completed  sets  may 
overlap  in  a  single  instance  graph.  A  graphic  display  would  present  this  informa¬ 
tion  very  nicely. 

A  simple  feature  that  would  increase  user  interaction  with  automatic  pro¬ 
cedures  would  be  a  function  whose  value  is  determined  by  the  user.  When  the 
interpreter  sees  this  function  assigned  to  a  field  in  an  action  sketch,  it  holds  all 
the  forms  in  the  working  set.  It  then  notifies  the  user  when  he  next  signs  on, 
and  waits  until  the  user  makes  a  request  to  inspect  the  working  set.  At  that 
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point  the  user  is  allowed  to  assign  a  value  to  the  field  (or  possibly  abort  the  pro¬ 
cedure),  and  then  execution  will  resLune. 

Form  flow  between  stations  in  TIjA  is  determined  by  the  interplay  of 
automatic  procedures.  I'low  of  execution  could  be  made  more  explicit  by  pass¬ 
ing  control  between  procedures  in  different  stations.  One  could  then  pass  work¬ 
ing  sets  of  forms  between  procedures.  In  this  way  we  could  explicitly  determine 
the  order  of  operations.  Procedures  could  then  be  called  from  other  procedures 
without  the  need  for  form  gathering.  Decision  points  could  be  modelled  by 
branching  rather  than  by  a  variety  of  similar  working  sets  of  forms. 

A  form  procedure  is  meant  to  capture  the  notion  of  an  office  worker  collect¬ 
ing  forms  at  his  desk  until  a  "complete  set"  is  compiled.  He  can  then  process 
the  forms  and  file  them  or  send  them  on  their  way.  Processing  of  the  collection 
of  forms  may  cause  forms  to  be  modified  or  new  forms  to  be  added  to  the  set. 
The  context  of  OFS  limits  the  range  of  possible  actions  upon  forms.  There  arc 
also  many  things  that  persons  can  do  with  OFS  which  have  not  been  modelled  in 
TLA.  We  have  sacrificed  generality  in  TLA  in  order  to  keep  a  simple  and  easy  to 
use  interface.  Other  systems  are  more  genered  but  they  require  a  more  ela¬ 
borate  workstation  for  their  implementation  and  more  sophistication  in  their 
usage. 

TLA  does  not  assume  any  knowledge  of  the  system  state  other  than  what  is 
available  to  the  user  in  his  form  file  or  his  mail  tray.  This  corresponds  to  the 
notion  in  OFS  that  users  can  only  manipulate  the  forms  that  they  "own".  Any¬ 
thing  happening  outside  their  own  workstation  does  not  concern  them.  The 
domain  of  automation  is  that  of  the  individual  workstation.  The  complexity  of 
determining  when  to  trigger  a  procedure  is  thereby  considerably  reduced. 

Another  aspect  of  automation  supplied  by  TLA  is  that  of  "smart  forms" 
which  automatically  fill  in  certain  fields  of  a  form  with  previously  filled-in  fields 
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as  arguments.  The  domain  here  is  that  of  the  form  alone,  so  triggering  takes 
place  whenever  a  form  value  is  entered  or  modified.  "Smarter  forms"  with  fields 
that  change  value  depending  upon  time  conditions,  the  state  of  the  system,  or 
any  other  variable,  were  not  implemented.  Some  "smarter  form"  problems  can 
be  solved  with  TLA’s  automatic  procedures. 

Both  OFS  and  TLA  have  been  implemented  in  the  same  environment.  Com¬ 
patibility  with  OFS  was  maintained  in  TLA.  Changes  to  code  and  the  internal 
representation  of  an  OFS  system  were  mostly  additions  of  modules  and  file 
directories.  Where  existing  files  and  code  were  modified,  compatibility  was 
maintained,  so  that  OFS  would  simply  ignore  the  added  TLA  features.  Conversion 
costs  from  an  OFS  system  to  one  that  supports  TLA  are  negligible,  and  any  TLA 
system  could  be  run  with  the  OFS  subset. 

We  have  outlined  a  form  procedure  specification  facility  which  utilizes  forms 
as  an  interface  to  the  user.  Notice  that  this  is  not  a  case  of  the  same  form 
instance  being  looked  at  by  different  templates.  This  is  a  case  where  the  same 
template  is  used  for  three  different  purposes.  First,  it  is  used  to  portray  a  form 
instance  itself.  Second,  to  portray  a  precondition  on  a  form  instance.  Third,  it 
is  used  to  portray  an  action  on  the  form  instance. 

4.  FORM  FLOW 

4.1.  Form  models 

Form  management  is  not  complete  wTthout  discussing  some  models  for 
understanding,  designing  and  analyzing  of  form  operations  in  offices.  The  office 
mformatioii  systems  should  be  carefully  designed  and  analyzed.  The  introduc¬ 
tion  of  automatic  form  procedures  may  generate  man}?^  unwanted  situations  if 
the  system  is  not  \flrell  designed.  For  instance,  forms  can  circulate  forever, 
forms  can  get  stuck  at  the  wrong  stations,  etc.  In  addition,  the  structure  of  the 
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office  should  be  carefully  studied,  e.g.,  who  is  doing  what,  who  is  sending  forms 
to  who.  In  this  way  reslructui’ing  can  Lake  place  for  many  desired  goals.  For 
instance,  we  may  want  to  enforce  logical  properties  in  the  office,  e.g.,  a  form 
receives  all  the  right  authorizations.  In  addition,  we  may  want  to  restructure  to 
minimize  cost  functions,  e.g.,  forms  do  not  follow  paths  without  reason,  they  do 
not  bottleneck  at  stations,  etc. 

There  are  two  broad  uses  of  models.  One  type  of  model  is  mainly  descrip¬ 
tive,  e.g.,  Ob’L  [Hammer  and  Kunin  1980].  This  type  of  model  serves  mainly  the 
function  of  requirements  specification.  It  has  a  component  to  describe  struc¬ 
tures  of  information  in  the  office.  This  component  resembles  closely  many  data 
models  proposed  for  data  base  management  [Tsichrilzis  and  Lochuvsky  1981]. 
In  addition,  it  has  a  functional  specification  component.  In  this  way  the  process¬ 
ing  and  the  movement  of  information  units  in  the  office  can  be  described.  This 
component  has  operations  specifications  and  a  way  of  portraying  flow.  The  usual 
way  to  portray  flow  is  using  some  asynchronous  parallel  operations  model,  e.g., 
Petri  nets  [Peterson  1977].  Petri  nets  as  originedly  defined  do  not  have  coloured 
(typed)  stones  and  do  not  portray  matching  conditions.  Petri  nets  need  to  be 
augmented  to  portray  the  matching,  data  dependent  operations  which  affect 
flow  in  the  office  [Zisman  1977].  For  instance,  an  application  is  processed  when 
the  matching  letters  of  recommendation  arrive,  not  just  any  letter.  Another 
Petri-net  like  model  for  office  description  is  Information  Control  Nets  (ICN’s) 
[Ellis  1979,  Cook  1980]. 

In  the  case  of  form  management  a  descriptive  model  is  not  very  complex. 
The  data  modelling  aspects  draw  heavily  from  data  models.  Afterall,  forms  can 
be  thought  as  interpreted  data.  The  entry  procedures  mirror  constraint 
specification  eind  operation  augmentation  as  it  exists  in  some  data  models,  e.g., 


triggers  in  System  R  [Astrahan  et  al  1976].  The  operation  on  forms  are  directly 
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related  to  data  model  operations,  e.g.,  find  forms,  modify  form  entries,  etc.  The 
only  aspect  of  form  management  wliich  is  not  captured  by  data  models  is  form 
flow.  For  that  reason  there  is  a  need  for  a  model  of  form  flow  between  stations 
[Ladd  and  Tslchritzis  IBbO].  We  are  also  working  on  a  visual  tool  for  portraying 
form  flow.  In  this  way  the  model  can  be  illustrated  in  an  interactive  environ¬ 
ment  to  aid  office  system  specification. 

A  separate,  very  important  use  of  models  deals  with  analysis  of  office  form 
systems.  In  this  case  the  properties  of  the  system  should  be  sufficiently 
abstracted  into  a  mathematical  framework  which  can  be  further  analyzed. 
Different  kinds  of  mathematicad  frameworks  may  be  important  to  analyze 
different  aspects  of  the  system.  Consider  the  case  of  form  management.  The 
static  structure  of  the  station  network  gives  us  a  graph  of  form  flow.  Let’s  sup¬ 
pose  that  the  decisions  of  local  routing  of  forms  and  the  operations  on  them  can 
be  described  in  closed  form.  The  paths  that  forms  can  take  and  the  combined 
operations  on  these  paths  can  be  analyzed  in  certain  cases  using  graph 
theoretic  algorithms.  We  could  analyze,  for  instance,  whether  forms  get  into 
infinite  loops,  or  whether  forms  pass  through  the  right  stations.  Attaching 
volumes  to  form  types  we  can  analyze  the  network  as  a  commodity  flow  network. 
This  type  of  einalysis  is  the  subject  matter  of  the  following  sections.  It  should  be 
pointed  out  that  the  analysis  applies  equally  well  to  automated  and  manual  sys¬ 
tems, 

4.2.  Abstraction 

Consider  an  office  comprised  of  office  stations  For  this  section  it  does 

not  much  matter  whether  the  stations  represent  persons  playing  a  particular 
role  in  the  organization,  or  workstations  out  of  which  different  persons  can 
operate  in  similar  manner.  A  station  is  an  abstract  entity  which  relates  a  per¬ 
son,  a  role  the  person  plays  and  a  physical  location  and  device  through  which  he 
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operates.  Messages  flow  between  the  stations  as  persons  perform  their  duties. 
We  will  assume  that  we  eure  dealing  with  messages  having  a  certain  regularity 
v/hich  is  captured  by  forms.  Forms  are  of  difl’erent  types.  Each  type  of  form  has 
a  set  of  attributes  X,Y,Z,  •  •  •  .  In  order  to  simplify  our  investigation  we  will 
assume  that  form  attributes  take  a  single  value  for  each  instance  of  the  form. 
An  instance  of  the  form  has  particular  values  for  each  attribute.  The  set  of 
vedues  will  be  denoted  by  x,y,z,  •  ■  •  .  For  the  purpose  of  analysis  the  set  of  attri¬ 
bute  values  of  a  form  instance  will  represent  that  form  instance.  W'e  will  refer  to 
form  instances  from  now  on  as  simply /orms. 

Many  forms  flow  between  the  office  stations.  We  will  assume  in  this  section 
that  the  progress  of  each  form  is  independent  of  other  forms.  This  is  obviously 
not  true  in  real  systems  where  there  is  much  coordination  between  form 
instances.  For  instance,  in  our  automatic  form  procedures  wc  can  define  work¬ 
ing  sets  of  forms  which  are  needed  before  we  proceed  with  any  action.  Such 
coordination  problems  can  be  treated  with  models  but  they  will  not  be  discussed 
in  this  framework  [Ellis  1979,  Zisman  1977].  Since  we  will  not  deal  with  coordi¬ 
nation  we  can  treat  forms  of  each  type  separately. 

A  form  originates  in  a  particular  station  and  it  is  passed  from  station  to  sta¬ 
tion.  A  particular  station  may  keep  the  form,  in  which  case  the  form’s  circula¬ 
tion  is  terminated.  A  form’s  deletion  -will  also  be  treated  as  termination.  A 
form’s  duplication  will  be  treated  as  a  generation  of  a  new  form.  We  will  assume 
that  the  stations  are  consistent  in  processing  the  forms.  That  is,  if  a  form 
comes  back  to  a  station  with  identiced  values  . for  its  attributes  it  vrdl  be  treated 
the  same  way  as  previously.  This  situation  may  give  rise  to  indefinite  loops  in 
which  case  the  form  will  circulate  forever  in  the  office.  Such  behaviox'  should 
obviously  be  avoided. 

The  action  performed  on  a  form  in  a  slalion  may  or  may  not  be  determinis- 
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tic.  When  the  decision  about  processing  the  form  cannot  be  captured  inside  the 
model  we  have  a  case  of  nondeterminism.  For  instance,  suppose  that  the  treat¬ 
ment  of  a  form  depends  on  human  intuition,  or  outside  information,  or  complex 
decision  criteria.  We  can  then  say  something  about  the  choices,  but  not  pin  down 
exactly  the  action.  In  such  cases  we  have  to  analyse  potential  behavior  rather 
than  sure  behavior.  We  will  restrict  our  scope  to  the  deterministic  case. 

We  will  associate  mth  each  station  a  set  of  action  functions  and  a  set  of 
routing  predicates. 

The  action  associated  with  a  station  has  two  components.  One  aspect 
is  represented  by  a  set  of  functions  mapping  the  values  of  the  attributes  into 
new  values.  We  will  denote  by  Ai{xi,  '  '  '  ,Xn)  the  set  of  new  values  that  the  form 
takes  after  it  is  subjected  to  the  action  of  the  station  Si_.  The  second  aspect  cap¬ 
tures  any  actions  of  the  station  which  are  not  reflected  in  the  form’s  contents. 
They  do  not,  therefore,  affect  the  routing  of  the  forms  and  the  actions  of  other 
stations. 

The  routing  of  forms  associated  with  a  station  Si  will  be  represented  by  a 

set  of  predicates  P  on  the  values  of  the  form.  Pgi  is  true  (equal  one)  for  all 

forms  originating  in  station  Si.  A  predicate  Pij  is  defined  for  each  station  Sj 

connected  with  station  5*1  and  it  is  true  (equal  one)  for  any  form  which  is  routed 

from  station  Si  to  station  Sj.  Finally,  Piu  is  true  (equal  one)  for  every  form 

which  remains  in  station  Si.  We  will  expect  that  for  every  station  Si  the  defined 

predicates  Pij,  Pi^,  are  such  that  ^  Pij  +  Pi.j  =  1.  In  the  deterministic  case  only 

i 

one  of  the  predicates  is  true.  That  is,  a  form  after  coming  to  a  station  6’i  either 
goes  to  another  station  or  stays  forever  in  that  station. 

Consider  a  form  instance  (xi,  •  •  ■  .x^).  The  form  instance  will  take  the  path 
SiSj  '  ‘  •  Sj.Si  (it  will  originate  in  station  Si  and  go  through  all  the  stations  up  to 


S,)  iff: 
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‘  ‘  ‘  ‘■^n)  ~  ^ 

(Xi.  •  '  •  ,X„)  =  1 

JPkl  *4a:  •  •  ■  Aj  .4^  (xi.  •  •  •  .x„)  =  1 

The  final  values  of  the  form  after  the  actions  of  Si  will  be 

Ai  Ak  •  ■  ■  Aj  Ai  (xi.  ■  ■  •  ,Xn) 

In  this  framework  we  will  investigate  three  types  of  problems, 
l)  Correctness 

We  are  interested  in  determining  whether  some  forms  get  into  infinite 
loops.  This  situation  may  represent  abnormal  behavior. 

We  are  interested  in  characterizing  the  forms  that  end  up  at  a  specific  sta¬ 
tion.  This  information  can  be  compared  with  our  specifications  regarding 
termination  of  forms. 

3)  Form  processing 

We  are  interested  in  determining  the  paths  that  are  followed  by  forms. 

We  are  interested  in  defining  the  combined  actions  of  the  stations  on  forms 
during  their  lifetime. 

We  are  interested  in  evaluating  the  loads  which  form  IraiTic  poses  on  sta¬ 
tions  and  cbaxiiiels  of  conimunicalion. 

3)  Station  restructuring 

W^e  are  interested  in  equivalence  preserving  transformations  which  can  be 
performed  on  the  station  diagram. 

We  are  interested  in  restructuring  of  stations  for  desired  goals,  e.g.,  maxim¬ 
ize  parallelism. 

The  main  thrust  of  our  approach  is  to  define  equivalence  classes  of  forms. 
All  forms  in  the  same  class  take  the  same  path  of  stations.  We  will  outline  an 
algorithm  which  categorizes  the  forms  in  classes  and  obtains  the  path  for  each 
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class.  As  peirL  of  Lhe  algurilhxii,  we  also  isolate  the  forms  which  get  into  infinite 
loops.  The  actions  taken  by  forms  in  each  class  can  be  derived  by  going  down 
the  path  of  the  class.  The  loads  can  also  be  obtained  from  the  categorization  of 
each  class.  Finally,  a  notion  of  equivalence  is  defined  which  preserves  the  series 
of  actions  that  the  stations  take  for  each  form. 

4.3.  Problem  Specification 

Using  cycles  of  stations  in  the  station  diagram  we  can  set  up  flows  of  forms 
which  closely  resemble  loops  in  programs.  If  the  actions  performed  in  each  sta¬ 
tion  have  a  minimum  of  generality  we  can  mirror  operations  performed  by 
statements  of  programs.  Compositions  of  programs  can  be  portrayed  using  con¬ 
nection  of  station  diagrams  feeding  the  output  of  one  as  input  to  the  other. 
Using  these  analogies  we  cein  simulate  computations  of  programs  using  indivi¬ 
dual  actions  on  stations  and  the  control  structures  provided  by  form  routing. 
This  situation  implies  that  must  of  the  problems  we  want  to  deal  with  are  analo¬ 
gous  to  problems  of  behavior  of  programs.  For  instance,  a  problem  of  termina¬ 
tion  of  programs  can  be  mapped  into  a  problem  of  a  form  being  into  an 
indefinite  loop.  It  is  well  known  that  program  behavior  problems  are  not  only 
hard  but  also  unsolvablc  for  the  general  case  of  programs.  The  use  of  reduction 
indicates  that  many  form  flow  problems  will  also  be  unsolvable.  We  should  there¬ 
fore,  look  at  restricted  cases  of  both  actions  (A’s)  and  routing  (P’s)  in  order  to 
have  a  chance  to  solve  interesting  problems. 

In  a  previous  paper  we  looked  at  the  special  case  where  the  routing  predi¬ 
cates  P  depended  only  on  a  set  of  attributes  which  were  not  allowed  to 
change  during  the  lifetime  of  a  form  [Ladd  and  Tsichritzis  1980].  In  addition, 
the  P’s  were  assumed  to  be  expressed  by  Boolean  expressions  on  simpie  condi¬ 
tions,  attr<op>  value,  on  the  form  attributes.  Under  these  restrictions  most 


problems  become  tractable  and  the  form  paths  can  be  completely  categorized. 
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It  is  reasonable  to  assume  that  the  P  predicates  are  Boolean  expressions. 
This  represents  the  fact  that  forms  in  each  station  are  locally  treated  as  classes. 
They  arc  acted  upon  and  routed  depending  on  their  classes.  In  this  case,  a  class 
is  not  a  type  but  an  equivalence  class  of  forms  'irithin  that  t^qae.  The  classes  are 
defined  according  to  ranges  of  their  attribute  values.  Tor  instance,  a  loan  appli¬ 
cation  will  be  treated  differently  when  the  loan  is  below  10,000  Dollars  and  above 
10,000  Dollars.  The  restriction  of  not  changing  the  attribute  values  of  routing 
attributes  is  not  realistic  in  practice.  It  implies  that  all  routing  is  based  on  fixed 
values  of  attributes  which  are  completely  predefined.  In  real  systems  there  are 
many  situations  in  which  an  authorization  in  the  form  of  a  signature  will  alter 
the  path  of  a  form.  We  should,  therefore,  allow  routing  attributes  to  change. 
However,  if  we  allow  them  to  change  arbitrarily  we  can  very  quickly  get  into 
intractable  situations.  On  closer  inspection,  the  routing  attributes  do  not  get 
arbitrary  values.  In  most  realistic  cases  they  are  set  to  constant  values.  For 
instance,  an  authorization  is  set  to  "authorized'’  or  "not  authorized".  An  applica¬ 
tion  is  set  to  "accept"  or  "turned  down".  The  value  of  the  attribute  is  then  used 
to  determine  different  paths. 

The  following  restricted  set  of  actions  and  routing  predicates  seem  both 
reasonable  in  terms  of  real  systems  and  amenable  to  theoretical  treatment  as  it 
will  be  seen  in  the  rest  of  the  paper. 

Consider  the  set  of  attributes  in  a  form.  We  will  smgie  out  these  attributes 
which  directi}?^,  or  indirectly  affect  the  path  of  a  form  and  call  them  control  attri¬ 
butes. 

Control  attributes  are  the  attributes  which  are  either  routing  attributes  (in 
the  definition  of  P’s)  or  they  are  attributes  on  which  the  routing  attributes  ulti¬ 
mately  depend  (according  to  the  definition  of  A’s).  That  is,  an  attribute  which  is 
not  a  routing  attribute,  can  still  control  form  flow  if  its  value  affects  the  value  of 
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a  routing  attribute  as  a  result  of  an  action  A  inside  a  station.  For  instance,  the 
size  of  a  loan  application  may  affect  an  attribute  called  authorization.  Although 
the  loan  size  is  not  directly  used  to  route  the  application,  it  is  used  indirectly 
through  setting  the  attribute  authorization. 

We  restrict  our  scope  to  the  case  where: 

1)  The  routing  predicates  P  are  of  the  form  of  a  Boolean  expression 

B  fA'i<op  >u  ^  of  simple  conditions,  where  A'i  is  a  routing  attribute,  <op>  is 
one  of  the  operators,  =,  >,  <,  ^  and  u  is  a  value  for  We  also  make 

the  assumption  that  the  domains  of  the  values  of  the  attributes  are 
ordered.  However,  we  do  not  make  the  assumption  that  the  domains  are 
bounded.  In  reed  practice  they  are  of  course  bounded,  but  their  bounds  are 
too  high  to  handle  them  exhaustively. 

For  each  predicate  (qualifying  forms  going  from  Si  to  Sj)  there 

corresponds  a  Boolean  expression  which  without  loss  of  generality  can  be 
assumed  to  be  in  disjunctive  normal  form: 

Pij  =  F  {CiACpA^  •  •  ACn] 

where  Ck  is  a  conjunction  of  simple  conditions  involving  attribute  At. 

2)  The  action  functions  on  the  control  attributes  can  be  expressed  by  a  deci¬ 
sion  table  of  control  variables.  The  operations  performed  by  the  decision 
table  involve  setting  the  veilues  of  attributes  to  constants.  Attributes  which 
are  not  control  attributes  can  have  arbitrary  actions  performed  on  them  as 
expressed  by  the  definitions  of  A^'s.  Their  values  do  not  affect  either 
directly  or  indirectly  the  routing  of  the  forms.  The  actions  also  include  pro¬ 
cessing  which  does  not  directly  affect  the  form’s  attributes  and  it  will  not 
be  treated  in  detail. 


Each  action  Ai  (action  in  station  Si)  has  two  parts.  In  the  first  part  all  con- 
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trol  attributes  can  be  set  to  values.  This  operation  is  accomplished  by  a  deci¬ 
sion  table  with  entries  of  the  form; 

in  the  case  V  ^CiACgA  ■  ■  '  AC^jj  the  variable  is  set,  Xjc  =  u 

The  conditions  for  setting  a  control  variable  are  again  in  disjunctive  normal  form 
and  the  C's  are  conjunctions  of  simple  conditions  on  control  variables.  The 
second  part  of  action  which  is  not  used  to  set  control  variables  will  be  denoted 
asAi  and  it  will  stay  uninterpreted. 

It  is  interesting  to  point  out  how  we  obtain  the  control  attributes  in  this 
framework.  We  have  to  go  through  an  iterative  algoritbTn.  We  first  start  with  the 
routing  attributes,  i.e.,  ail  attributes  which  appear  in  the  routing  predicates.  We 
define  at  this  point  the  set  of  control  attributes  as .  consisting  only  of  routing 
attributes.  We  then  look  at  all  lines  of  the  decision  tables  which  set  control 
attributes.  Any  attribute  appearing  in  a  condition  of  such  a  line  is  also  a  control 
attribute.  We  iteratively  continue  this  procedure  until  we  get  no  more  control 
attributes. 

In  figure  4  we  present  an  example  of  a  station  diagram  and  the  forms  circu¬ 
lating  in  it.  The  example  involves  three  control  attributes  X  I', -Z'.  The  attributes 
X,Z  are  routing  attribiites.  The  attribute  Y'  is  a  control  attribute  because  it  is 
used  to  set  Z  in  an  action  of  station  5’ i.  In  our  example  any  form  can  enter  in 
•S'!.  (Peei  i-S  true).  Forms  only  terminate  in  S' g  and  S'3.  The  values  y\,y2 

and  21,22.2:3  are  in  ascending  order  for  each  attribute. 

Consider  a  control  variable  and  all  the  simple  conditions  which  appear  in 
Cfc’s  in  all  the  definitions  of  the  routing  predicates  Py  and  the  actions  A^’s  for  all 
stations.  Suppose  there  are  such  simple  conditions. 

These  simple  conditions  of  the  form,  X}c<op>  value,  divide  the  domain  of 
in  M1.  +  I  equivalence  classes  in  the  following  way.  For  each  condition  X^.  =  u 
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(or  Xt^u)  the  value  u  determines  by  itself  an  equivedence  class.  We  look  now  on 
the  domain  of  values  of  Xj^  minus  the  mfc  such  constant  values. 

Consider  now  the  conditions  X)c  ^  v,  Xjc  >  v',  X)c  ^  'uj,  Xf^  <  'ui'.  We  order  the 
values  v,v\w,w'  according  to  the  order  of  the  domain.  That  operation  defines  in 
a  natural  way  ^*-771*  +  !  ranges  of  values  of  X)(.  The  boundaries  of  the  ranges 
depend  on  the  operator*  In  the  cases  Xfc  '^.v  ov  Xj^^  <  w’,  v  or  w’  belong  in  the 
upper  range.  In  the  cases  >  7;'  or  ^  zu,v’  or  w  belong  in  the  lower  range. 

In  the  example  of  figure  4  we  have  ranges  of  attribute  values  as  in  figure  5. 
There  are  three  ranges  for  X  with  breaking  points  and  xs.  There  are  two 
ranges  for  Y  with  breaking  point  Vj.  Finally,  there  are  three  ranges  for  Z  with 
breaking  points  zi  and  23.  Notice  that  both  values  xg  and  zg  are  not  breaking 
points  of  equivalence  classes.  They  are  not  used  in  any  routing  predicate  or  con¬ 
dition  for  set  Ling  values. 

At  this  point  we  observe  that  the  path  of  a  form  does  not  depend  strictly  on 
the  value  of  an  attribute  Xjc  but  on  the  equivalence  class  where  the  value  lies,  or 
comes  to  lie  as  it  is  changed.  All  the  values  of  X^  within  an  equivalence  class 
make  all  simple  conditions.  X}.<op>  value,  keep  the  same  truth  value.  If  A'*, 
takes  one  of  the  m*.  constant  values  u  (in  its  own  equivalence  class)  then  all  sim¬ 
ple  conditions  involving  Xk  take  a  particular  truth  value.  If  X^  ranges  within  an 
equivalence  class  then  all  the  simple  conditions  have  the  same  truth  value 
because  of  the  way  the  equivalence  class  has  been  defined. 

Let  us  assume  that  the  ranges  of  values  representing  the  corresponding 
equivalence  classes  have  been  defined  for  all  the  control  attributes 

We  define  the  state  of  a  form  at  a  station  as  the  tuple 

where  denotes  the  range  of  the  value  of  (rj^  takes  values  0,  •  •  • 

We  can  argue  at  this  point  that  if  two  forma  have  the  same  state  at  the  same 
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station  Si  they  will  be  treated  identically  from  then  on  as  far  as  routing  is  con¬ 
cerned.  This  statement  can  be  proved  by  induction  on  the  path  taken  after 
Since  the  two  forms  have  ail  control  variables  in  the  same  equivalence  classes, 
they  will  be  routed  the  same  way  from  Si  to  Sy  At  Sj  the  decision  table  of  the 
action  part  will  perform  the  same  operations,  hence  if  any  values  for  the  control 
variables  are  changed,  they  will  be  set  to  the  same  value  for  the  two  forms. 
After  this  they  will  still  be  in  the  same  state  and  so  on. 

It  follows  that  if  a  form  exits  the  same  station  twice  in  the  same  state  then 
it  will  cycle  indefinitely  since  the  subsequent  path  is  periodic. 

At  this  point  we  observe  that  there  are  N  *  Il{Affc  +  l)  different  possibilities 
for  form  states  in  stations,  where  N  is  the  number  of  stations  and  IT  stands  for 
the  product  over  aU  k.  Each  possibility  corresponds  to  a  form  being  in  a  station 
with  its  control  variables  in  particular  equivalence  classes.  We  conclude  that  if  a 
form  in  our  environment  has  a  path  longer  than  N  *  n(.:’!fi;  +  l)  then  it  wdll  repeat 
the  same  state  at  the  same  station.  The  form  will,  therefore,  be  in  a  loop.  This 
situation  suggests  an  algorithm  which  determines  whether  a  form  is  in  an 
indefinite  loop.  We  only  have  to  follow  forms  and  record  their  states  for  paths  up 
to  xV  *  ri{Ar;i;  +  l)  long.  In  a  realistic  situation  the  algorithm  is  not  that  inefficient. 
The  overall  number  of  conditions  on  control  variables  cannot  be  that  large  (the 
routing  logic  will  get  too  difficult  as  a  result).  Observe  also  that  the  product 
N  *  II(Afjk  +  l)  is  largest  when  the  conditions  are  equally  distributed  between  con¬ 
trol  attributes.  This  situation  is  probably  rare. 

We  should  point  out  that  there  are  cases  (albeit  pathological)  in  which  we 
must  look  at  long  paths.  These  paths  repeat  stations  many  times,  but  they 
change  state.  We  cannot  claim,  therefore,  that  the  form  is  in  a  loop.  Consider, 
as  an  example,  tv^o  stations  and  S.^.  The  control  attributes  are 
Let's  suppose  that  each  Xj.  is  divided  in  R  +  V  ranges  using  R  simple  conditions 
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Xii  ^  u.  The  routing  of  is  such  that  it  sends  to  all  forms  except  the  ones 
that  all  A'j,  ■  •  •  are  in  their  lowest  reinge.  In  that  latter  ease  it  exits  the  form 
from  the  system.  The  station  5*2  always  sends  the  form  back  to  Si.  The  action 
of  5*2  is  a  complicated  decision  table.  It  sets  the  values  of  down  by  1 

where  rj,  •  ■  •  is  looked  at  as  a  number  base  7?  +  l.  In  this  example  some 
forms  will  pass  from  So  {R  +1)^  times  before  they  stop  and  they  will  have  a  path 
2  *(R  +1)“'^  long. 

We  know  that  we  may  have  to  look  at  long  paths.  However  in  the  following 
section  we  will  outline  an  algorithm  which  only  looks  at  paths  that  are  actually 
taken  by  forms.  Hence,  it  will  look  at  long  paths  only  if  they  are  taken  and  do 
not  lead  to  infinite  loops,  in  that  sense  the  algorithm  is  well  behaved  because  it 
looks  only  as  far  as  it  should. 

4.4.  Algorithm 

As  was  pointed  out  in  the  previous  section  aU  forms  with  attribute  values 
within  the  same  range  are  routed  and  generally  treated  the  same  way.  In  this 
section  we  will  try  to  represent  and  analyze  the  different  paths  that  forms  take. 

Consider  a  bit  string  A  representing  the  range  of  each  control  attribute  of  a 
form  which  is  defined  in  the  following  way. 

2l  =  ■  ’  iHn) 

where  for  every  k,  £Ljc  corresponds  to  the  control  variable  Xk  and  it  is  a  bit  string 
of  Mjc  +  l  bits  (one  bit  for  each  range  in  A\.)  with  the  bit  of  the  corresponding 
range  of  set  at  one  and  all  other  bits  set  at  zero. 

The  forms  where  is  in  the  range  (we  assume  the  reinges  ordered  in  a 
natural  way),  X2  in  the  7x2  range,  will  be  represented  by  the  bit  string  h.  where 
the  Til  hit  is  on  in  ni  and  then  the  712  bit  is  on  in  JI2.  etc.  All  these  forms  will 
have  the  same  state.  They  will  be  treated,  therefore,  in  the  same  way  in  our 
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model  of  office  flow.  For  instance,  going  back  to  our  example  of  figure  4.  5,  the 
forms  where  A!'  ^  Xq,  Y  ^  y  i  and  Z  ^  23  will  be  represented  by  the  bit  string  (001, 
01,  001)  where  the  commas  delimit  the  bit  strings  andi!*. 

We  can  generalize  the  bit  string  h.  and  allow  forms  where  each  variable 
can  be  in  more  than  one  range.  In  this  case  more  than  one  bit  is  set  to  one  for 
the  substring  jiji.  A  bit  string  Jl  represents  the  forms  where  the  Xj^  control  attri¬ 
bute,  for  each  k,  can  take  values  in  the  ranges,  corresponding  to  the  non  zero 
bits  of  the  substring  of  it  For  instance,  in  our  example  the  bit  string  (Oil,  01, 
Oil)  will  represent  the  bit  strings  (OOl,  01,  OOl),  (OlO,  01,  OOl),  (001,  01,  010) 
and  (010,  01,  010).  The  bit  string  represents  forms  w^here  X  k  x^.Y  ky^  and 
Z  S  2i. 

Consider  now  a  station  S^.  We  claim  that  each  routing  predicate  Fij  can  be 
represented  by  a  set  of  corresponding  bit  strings  •  ■  •  ,iLy.  where  each  bit 

string  corresponds  to  a  conjunction  in  the  definition  of  PiY 

Recall  that 

Pij  =  t'(C,ACjA  ■  ■  •  AC„) 

Consider  the  7/ith  conjunction  separately.  It  qualifies  a  set  of  ranges  for  the  con¬ 
trol  variables  and  it  can  be  represented  by  a  bit  string.  Let's  assume  that  in  the 
definition  of  ordering  of  ranges  we  put  as  highest  all  the  ranges  corresponding  to 
simple  conditions  —  u  (or  Xj^^u),  We  then  order  the  ranges  according  to  the 
order  of  the  values  of  Xf^,  Let  us  also  assume  that  the  simple  conditions 
CiA-  •  •  hCn  are  ordered  according  to  the  control  variables  A’j.. 

Th  ere  is  a  very  simple  algorithm  which  w^e  can  follow  to  generate  the  bit 
strings  ii.  We  start  with  aii  string  of  I’s  and  turn  off  bits  each  time  we  encounter 
a  condition  For  Xj.  -  u  w'^e  turn  off  the  bits  of  all  the  other  ranges  except  the 
u  range.  For  A*  u  we  turn  off  the  bit  of  the  u  range.  For  X).  k  u  or  >  u  we 
turn  off  the  bit  of  all  loiver  ranges.  For  ^  v.  or  X^  <  we  turn  off  the  bit  of  all 
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higher  ranges. 

As  a  result  of  the  algorithm  we  will  end  up  with  a  bit  string  for  each  con¬ 
junction  representing  the  ranges  oi  values;  where  this  conjunction  is  true.  If 
for  a  particular  bit  string  the  whole  substring  corresponding  to  the  variable 
X]c  is  zero,  then  the  bit  string  represents  the  empty  set.  We  portray  in  figure  6 
the  bit  strings  for  each  routing  predicate  of  our  example. 

In  the  same  way  that  we  represented  the  routing  predicates  we  can 
represent  the  actions  on  the  control  variables.  Consider  an  action: 

on  =  7{CiA'  •  •  hCn)  set  Ajt  = 

The  action  will  be  represented  by  a  set  of  bit  strings ^  one  for  each  con¬ 
junction  in  C.  We  also  represent  the  setting  of  by  flagging  (with  an  asterisk) 
the  range  where  the  value  u  falls  in  the  bit  strings.  The  set  of  forms  which  did 
not  qualify  for  setting  Aj;  will  have  to  be  represented  as  bit  strings  of  the  com¬ 
plement  without  a  flag  for  setting  A*,  In  our  example  the  setting  operation  of  5i 
is  represented  by  setting  a  flag  in  the  case  of  forms  in  ranges  represented  by 
the  bit  string  (111,  01,  111). 

Consider  now  a  station  5’^  which  receives  a  set  of  forms  as  represented  by 
the  bit  strings 

input  =  fii,  •  •  • 

The  actions  on  the  station  will  be  represented  by  performing  the  following  opera¬ 
tion.  For  each  control  variable  X^  set  in  the  station,  obtain  all  the  non  empty  bit 
strings  (bj  and  e^)  for  all  combinations  of  hj  J  =  l.  '  ’  ’  ,n  and£^,  q  =  l,  •  ■  •  ,t  with 
the  appropriate  bit  flagged.  The  and  operation  is  the  pairwise  conjunction  of  the 
corresponding  bits  of  the  strings.  We  also  need  to  obtain  the  bit  strings 
corresponding  to  the  complement  which  represents  the  forms  which  do  not  qual¬ 
ify  for  se  tting  the  Xj^ . 


The  routing  can  also  be  represented  by  pairwise  conjunctions  of  sets  of  bit 
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strings.  If  P^j  is  represented  by  the  bit  strings  and  the  forms  coming 

out  of  a  station  are  represented  byiii,  •  •  ■  .hn  then  the  forms  going  from  to 
Sj  are  represented  by  the  non  empty  bit  strings 

(5^^  and  h^)  for  1  S  <7  ^  n  and  1  ^  j  ^  m 

In  our  discussion  we  did  not  clarify  the  meaning  of  conjunction  operations 

in  the  case  of  the  flagged  bits.  If  a  bit  is  flagged  it  implies  that  the  value  of 
has  been  set  in  that  range.  The  flagged  bit  overrides  everything  else  in  its  own 
substring  of  Xk-  Each  substring  corresponding  to  an  Xk  can  have  only  one 
flagged  bit.  If  a  bit  is  flagged  again  then  the  old  flag  is  dropped  (the  has  been 
set  to  a  new  value).  The  flags  enable  us  to  retain  not  only  the  fact  that  the  value 
of  Xk  has  been  set  in  a  range  but  also  information  on  the  initial  values  of  the 
attribute  Xk-  This  information  is  useful,  as  it  will  be  seen  later  on  in  this  section, 
when  we  estimate  loads. 

When  we  perform  a  pairwise  conjunction  between  two  corresponding  sub¬ 
strings  of  Xk  only  one  of  them  is  flagged.  We  have  two  cases.  If  the  correspond¬ 
ing  bit  to  the  flag  in  the  flagged  substring  is  one,  we  copy  the  flagged  substring. 
If  the  corresponding  bit  to  the  flag  is  zero  then  we  set  everything  to  zero.  For 
instance,  consider  in  our  example  the  string  (ill,  01,  11*1)  as  it  represents 
forms  which  entering  stations  are  flagged  (forms  where  Y  ^  yi,  so  Z  is  set  to 
za).  The  routing  predicate  of  forms  going  from  Si  to  S2,  P12  is  represented  by 
(111,  11,  Oil).  The  routing  predicate  Pjs  is  (ill,  11.  100).  The  pair^vise  conjunc¬ 
tion  between  the  strings  (ill,  01,  11*1)  and  (ill,  11,  Oil)  is  (ill,  01.  11*1). 
This  conjunction  represents  the  forms  going  to  5*2  after  being  flagged  by  Si.  The 
conjunction  (111,  01,  11*1)  and  (ill,  11,  100)  is  empty.  Interpreting  our  exam¬ 
ple,  the  flagged  string  represents  the  forms  which  have  Z  set  to  Zg*  This  opera¬ 
tion  implies  that  none  of  these  forms  will  go  to  S' 3,  since  Z  >  zi,  while  all  of  them 
will  go  to  S'g,  for  the  same  reason. 
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Using  these  operations  on  the  bit  strings  we  can  have  an  algorithm  -which 
starts  with  a  station  Si  where  forms  originate  and  explores  potential  paths  that 
they  can  take  in  a  tree  fashion.  In  each  case  the  algorithm  keeps  track  with  bit 
strings  of  the  set  of  forms  •v\'-hich  go  down  that  path.  When  all  the  bit  strings 
associated  with  a  path  are  empty  (represent  the  empty  set)  that  path  is  not  fol¬ 
lowed  any  more. 

Care  should  be  taken  when  a  station  repeats  in  a  path.  In  that  case  the  two 
sets  of  input  bit  strings  for  the  two  occurrences  of  the  stations  should  be  com¬ 
pared  to  make  sure  that  there  are  no  forms  coming  back  to  the  station  in  the 
same  ranges.  If  they  are,  we  know  the}'-  will  sta}?^  in  a  loop  and  we  can  establish 
the  period  of  the  loop  (path  in  between  the  stations).  The  comparison  is  rela¬ 
tively  easy  because  the  bit  strings  of  the  station  occurrence  closer  to  the  root  of 
the  tree  always  cover  the  bit  strings  of  the  new  occurrence  of  the  station.  The 
comparison  operation  should  be  performed  for  any  pair  of  stations  repeating  in 
a  path  since  a  setting  operation  may  reverse  a  form  to  a  very  old  state. 

The  algorithm  can  be  summarized  in  the  following  manner; 

1)  Start  with  a  generating  station  i  and  feed  to  it  the  bit  strings  representing 
the  predicate 

2)  Follow  each  potential  path  that  a  form  may  take  constructing  a  tree  of 
paths  along  the  -vtmy. 

3)  Each  time  you  arrive  at  a  station  you  have  a  set  of  bit  strings  representing 
the  input  to  the  station  in  the  particular  path  you  are  following.  You  per¬ 
form  a  pairwise  conjunction  -with  the  bit  strings  of  the  decision  table  of  the 
station  (if  any)  and  flag  the  appropriate  bit  in  the  resulting  strings.  Hit 
strings  representing  empty  sets  are  discarded. 
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4)  Each  time  you  exit  from  a  station  you  perform  a  pairwise  conjunction  of  the 
bit  strings  representing  the  forms  (as  they  have  been  modified  by  the  sta¬ 
tion)  and  the  bit  strings  representing  each  routing  predicate  Py,  including 
the  terminating  predicate 

5)  Each  time  you  enter  a  station  you  check  whether  the  same  station  repeats 
in  the  path.  If  it  does,  you  perform  a  comparison  of  the  bit  strings 
representing  the  forms  as  they  entered  each  time  to  the  stations.  The 
result  of  the  comparison  represents  the  forms  that  get  in  a  loop.  The 
period  of  the  loop  is  the  series  of  stations  in  between. 

The  algorithm  works  in  such  a  way  that  guarantees  that  a  path  is  not  fol¬ 
lowed  unless  there  are  potentially  some  forms  that  take  it.  This  is  true  because 
we  keep  track  with  the  bit  strings  of  the  forms  that  take  the  paths.  If  there  are 
no  forms  along  the  path  the  bit  strings  will  represent  the  empty  set.  If  there  is  a 
set  of  forms  which  will  get  into  an  indefinite  loop  it  is  determined  the  first  time 
their  state  repeats.  This  is  true  because  each  time  a  station  repeats  we  try  to 
extract  all  forms  with  that  period  which  get  into  a  loop. 

From  these  observations  it  follows  that  the  algorithm  will  not  follow  a  path 
more  than  it  needs  to  determine  either  that  forms  stop  at  a  station,  or  get  to  an 
infinite  loop,  The  algorithm  is  guaranteed  to  stop  because  of  the  observation 
that  no  path  can  get  longer  than  N  TT(Afjfc+  1)  without  involving  a  loop. 

The  cost  of  the  algorithm  can  increase  exponentially  with  the  number  of 
control  attributes.  The  exponential  behavior  of  the  algorithm  is  warranted 
because  there  are  cases  (albeit  pathological)  where  the  number  of  classes  is 
exponential.  Consider,  for  instance,  our  example  of  two  stations  and  if  attri¬ 
butes  as  it  was  outlined  in  the  end  of  section  2.  The  number  of  classes  for  this 
example  is  equal  to  the  number  of  possible  ranges  which  is  equal  to  (R+l)^, 
where  R  is  the  number  of  conditions  per  control  alLiibute.  We  cannot  expect  an 
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algorithm  to  categorize  an  exponential  number  of  classes  with  less  than 
exponential  cost. 

In  figure  7  we  outline  the  algorithm  for  the  example  of  Figure  In  this 
example  we  chose  to  continue  building  the  paths  until  the  form  would  pass  from 
a  station,  not  only  in  the  same  state,  but  with  the  same  set  values.  In  this  way  a 
loop  not  only  repeats  the  same  stations  but  performs  the  same  actions.  The 
algorithm  distinguishes  classes  of  forms  which  take  certain  paths.  The  classes 
are  always  represented  by  bit  strings.  Figure  9  summarizes  the  classes  and  the 
paths  they  take.  The  classes  in  figure  9  arc  directly  extracted  from  the  tree  of 
figure  7.  It  is  also  very  easy  to  reconstruct  from  the  bit  strings  the  ranges  of 
control  attributes  for  each  class.  For  instance,  figure  3  depicts  the  ranges  of 
control  attributes  taken  by  class  1.  Notice  that  the  number  of  classes  is  not 
liigher  than  the  combination  of  possible  ranges.  In  our  example,  this  number  is 
3x2x3=  18.  Closer  inspection  reveals  that  class  1  represents  five  input 
states,  class  8  five  input  states,  class  3  three  input  states,  class  4  two  input 
states  and  classes  5,  6,  7  represent  one  input  state  each.  Each  input  state 
corresponds  to  a  combination  of  ranges  for  X,Y,Z  as  associated  with  the  initieil 
values  of  the  forms  entering  station  Si. 

4.5.  Flow  analysis  and  station  restructuring 

After  establishing  the  form  classes  and  their  paths  we  can  determine 
whether  forms  circulate  indefinitely.  In  our  exaimple  all  forms  in  classes  2,  3,  4. 
5  and  6  circulate  forever.  Only  the  forms  in  classes  1  and  7  terminate  as 
expected.  This  knowledge  can  be  used  in  many  ways.  In  one  approach  we  can 
use  the  analysis  to  restrict  the  input  of  the  generating  stations  only  to  forms 
which  correspond  to  terminating  classes.  In  our  example,  only  the  forms 
represented  by  (010,  Oi,  ill),  (010,  10,  011),  (010,  10,  100)  terminate.  These 
three  strings  represent  the  forms  where  xi  ^  X  <  x^,  Y  and  Z  unrestricted.  This 
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situation  is  easily  verified  by  the  diagram.  After  all,  only  xi^X  <13  oari  exit 
from  either  S2  or  S^-  The  rest  of  the  forms  cannot  exit,  hence  they  have  to  cir¬ 
culate  indefinitely.  In  another  approach  we  can  change  the  station  diagram  by 
incorporating  escapes,  where  forms  exit  from  a  station  the  first  time  it  is 
detected  they  are  in  an  infinite  loop.  This  approach  assumes  that  performing 
the  cycle  of  the  loop  indefinitely  does  not  particularly  add  to  the  office  pro¬ 
cedures  and  it  can  be  eliminated.  Notice  that  this  escape  can  be  built  within  the 
model  by  routing  the  forms  of  the  non  terminating  classes  to  a  special  sink  sta¬ 
tion. 

Another  type  of  anadysis  can  determine  the  loads  of  the  forms  to  the  sta¬ 
tions  and  the  communication  channels.  The  number  of  forms  in  each  class  cem 
be  estimated  on  the  basis  of  the  representation  of  the  class.  The  problem  is 
reduced  to  estimating  the  number  of  forms  where  the  attribute  values  are  in 
certain  reinges.  This  is  analogous  to  the  problem  of  estimating  the  number  of 
records  selected  by  a  query  to  a  file  [Demolombe  1980].  To  illustrate  the  point 
suppose  that  the  attribute  values  are  independent  and  we  know  the  distribution 
of  attribute  values  in  the  ranges.  In  that  case  the  probability  of  a  form  lying  in  a 
class  can  be  easily  obtained  from  the  representation  of  the  class.  For  instance, 
in  the  case  of  class  1,  the  probability  of  a  form  being  in  that  class  is 

=  prob  Ka:i  ^  A"  <  .T3)  (y  1  ^  F)]  +  probKa^i  ^  A"  <  23)  {Y  <  y  i) 
or 

Qi  -  prob  [  X  <  xii]  prob  \yi^Y]  +  prob  ^  X  <  3:3]  prob  [Y  <  prob  [zi  ^  Zl 
where  each  individual  probability  is  easily  determined  from  the  distribution  of 

the  attribute  values.  If  we  want  to  waive  the  independence  assumption,  which  is 

perhaps  unrealistic,  we  have  to  deal  with  more  general  problems  of  selectivity 

[Christodoulakis  1981]. 

On  the  basis  of  the  probability  of  a  form  being  in  a  class  i  we  can  esti- 


-83- 


mate  the  number  of  forms  in  each  class  as  N  ’  Qi  where  N'  is  the  number  of 
forms  that  we  input  in  the  station  diagram.  This  m  turn  gives  the  load 
N  '  Qi  •  on  a  station  Sj  each  time  the  path  of  a  class  i  goes  through  it.  The 
term  My  j  is  a  measure  of  the  difficulty  of  performing  the  action  We  can  sum 
up  these  loads  down  the  paths  of  each  class  to  determine  the  overall  load  of  all 
classes  for  each  station.  In  a  similar  way  we  can  get  a  measure  of  the  load  of  the 
classes  on  communication  channels.  This  quantitative  euialysis  can  help  us 
match  station  capacities  and  channel  capacities  to  the  desired  loads.  If  the 
capacities  are  predetermined,  we  can  estimate  the  number  A'  of  forms  that  we 
can  input  to  the  system  without  exceeding  the  capacities  of  stations  or  com¬ 
munication  channels. 

We  have  analyzed  the  paths  that  forms  take  and  the  actions  performed  on 
these  paths  by  the  stations.  In  fact,  we  have  completely  categorized  the  forms 
in  classes.  We  can  also  investigate  the  restructuring  of  the  stations.  We  would 
expect  any  restructuring  to  preserve  the  same  actions  on  any  form  entering  the 
system.  In  real  office  systems  we  may  modify  or  eliminate  some  of  these 
actions.  This  is  especially  true  if  we  feel  there  is  duplication  of  activity.  How¬ 
ever,  in  an  abstract  model  we  have  to  treat  the  actions  as  uninterpreted  objects, 
which  cannot  be  modified  or  eliminated  arbitrarily. 

We  can  define  a  notion  of  equivalence  between  station  diagrams  in  the  fol¬ 
lowing  way.  First  we  define  a  combined  action  on  a  form.  It  is  the  sequence  of 
actions  that  are  performed  on  the  form  as  a  resiilt  of  the  form’s  presence  in 
each  station  of  its  path.  The  combined  action  is  a  series  of  actions  Ai.  The 
actions  Ai  may  involve  actions  on  the  form  itself,  i.e.,  setting  attribute  values. 
Forms  which  sire  in  the  same  class  have  the  same  combined  action.  This  situa¬ 
tion  follows  from  the  fact  that  the  paths  of  any  two  forms  in  the  same  class  are 
the  same  and  the  actions  ^4^  (as  uninterpreted  objects)  are  the  seime  in  each  sta- 


tion  Si  of  the  path.  We  can  talk,  therefore,  about  a  combined  action  of  a  class  of 
forms.  For  instance,  class  1  has  combined  action  g. 

Consider  the  restructuring  of  the  station  diagram  as  in  figure  10.  First,  we 
have  a  switch,  station  which  routes  the  forms  according  to  the  definition  of  their 
class.  In  figure  10  we  denote  the  routing  predicates  of  the  switch  station  in 
terms  of  bit  strings.  We  can  obtain,  as  it  was  portrayed  in  figure  8,  the  routing 
predicates  from  the  bit  strings.  We  then  provide  copies  of  the  original  stations 
Si  as  they  appear  in  the'  path  of  the  class.  These  copies  are  portrayed  by  small 
Si  in  the  diagram  of  figure  10.  A  copy  Si  of  a  station  Si  is  a  station  which  behaves 
identically  as  the  station  Si  as  far  as  actions  are  concerned. 

It  is  not  difficult  to  see  that  the  diagram  of  figure  10  is  equivalent  to  the  ori¬ 
ginal  diagram  of  figure  1.  That  is,  each  form  has  the  same  combined  action  in 
figure  4  and  figure  10.  Hence  the  office  procedures  resulting  from  the  process¬ 
ing  of  these  forms  are  the  same  irrespective  of  the  details  of  each  action  Aj. 
Notice  that  there  is  no  routing  decision  in  the  diagram  of  figure  10.  except  what¬ 
ever  is  provided  by  the  switch  station.  Every  other  station  Si  routes  forms  in 
only  one  destination.  Hence,  we  have  an  equivalent  station  diagram  with  routing 
greatly  simplified. 

Consider  now  the  action  of  stations  in  more  detail.  We  denote  byAi  the  part 
of  the  action  Ai  which  does  not  affect  a  form  passing  from  Si-  In  view  of  our  res¬ 
tricted  model  every  action  Ai  is  a  combination  of  setting  some  attribute  values 
according  to  a  decision  table,  followed  by  the  action  which  will  remain  unin¬ 
terpreted.  By  splitting  the  actions  Ai  in  these  two  parts  and  substituting  actions 
in  figure  10  we  obtain  an  action  diagram  as  in  figure  11.  Notice  that  the  setting 
operations  have  no  decision  part.  We  had  to  do  some  class  splitting  to  achieve 
this  cofiditioxi.  Yfe  end  up  with  a  diagram  where  not  only  we  have  no  routing 
decisions  (apart  from  the  switch  station)  but  no  decision  with  respect  to  setting 
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attribute  values.  In  addition,  we  can  combine  the  actions  which  are  per¬ 
formed  in  terms  of  the  same  form  contents.  We  can  derive  in  this  way  the 
diagram  of  figure  18.  We  portray  by  the  notation  (actions)"^  the  repetition  of  the 
actions  one  of  more  time.  This  diagram  of  consolidated  actions  enables  us  to 
define  a  new  station  diagram  which  is  equivalent  to  the  original  one.  but  which 
consolidates  actions  on  forms.  In  this  new  diagram  all  indefinite  loops  are  por¬ 
trayed  by  end  stations  which  periodically  perform  the  loop  combined  action  (if  it 
makes  sense).  This  isolation  of  indefinite  loop  activity  serves  two  purposes. 
First,  it  is  easier  to  determine  whether  the  repetition  of  the  loop  makes  sense. 
Second,  the  combined  action  of  the  loop  can  be  simplified  if  it  has  repetitive 
parts  which  are  redundant.  For  example,  suppose  4  ^  is  the  action  of  receiving  a 
letter  from  somebody,  A  2  is  giving  the  address  to  a  secretary  and  A  3  is  providing 
a  response.  The  cycle  (711.42^3)  can  eliminate  the  action.42  after  it  goes  through 
the  first  time.  That  is,  after  the  address  is  given  to  a  secretary  once,  there  is 
no  reason  to  give  it  each  time.  Notice  that  this  optimization  is  subject  to 
interpretation  and  the  change  does  not  conform  to  our  notion  of  equivalence. 
However,  such  optimizations  are  easier  done  when  the  looping  behavior  of  the 
office  procedures  is  isolated  and  can  be  scrutinized. 

This  is  a  good  point  to  return  to  our  analogy  of  program  behavior  and  form 
flow.  From  the  restructuring  of  Figure  12  it  is  apparent  that  the  operation  of 
the  stations  on  the  forms  can  be  expressed  with  programs  which  have  the  follow¬ 
ing  constructs.  First,  they  have  if ...  then  ...  else  statements  with  branching  con¬ 
ditions  represented  by  simple  conditions,  variable  <op>  value.  Second,  they 
have  assignments  statements  of  variables  to  constants.  Third,  they  have  labels 
and  unconditional  branching  statements  to  the  labels.  Fourth,  they  have  unin¬ 
terpreted  functional  symbols  which  do  not  affect  the  branching  variables.  Our 
analysis  of  form  flow  is  equivalent  to  the  analysis  of  such  programs. 
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5.  CONCLUDING  REMARKS 

We  have  outlined  a  facility  for  managing  forms.  By  forms  we  mean  a  natural 
extension  of  business  forms  which  can  tie  many  media  and  office  information 
objects  together.  In  this  way,  we  can  integrate  different  aspects  of  office  com¬ 
munication  and  processing  systems.  We  have  discussed  the  facilities  of  form 
management  both  abstractly  and  in  relation  to  our  prototype  system.  We 
pointed  out.  when  necessary,  the  difference  between  what  we  think  should  be 
provided  and  what  we  have  implemented.  Facilities  in  our  prototype  system  are 
geared  for  three  types  of  users.  Form  operations  are  mainly  geared  for  office 
workers.  Data  base  operations  are  provided  for  managers  and  data  processing 
personnel.  Finally,  office  procedures  are  available  for  office  administrators,  i.e., 
those  office  workers  with  sufficient  enthusiasm  and  extertise  to  learn  how  to  use 
them. 

In  our  prototype  system  stations  are  based  on  LSI  11/23's  running  UNIX. 
Each  LSI  11/23  has  up  to  256  Kb  of  memory  and  a  hard  disk.  A  toned  down  ver¬ 
sion  of  the  system  can  work  v.dth  an  LSI  11  ivith  less  memory  and  floppj^  disks.  In 
the  future  we  expect  a  station  to  be  M68000  based  "viith  up  to  1Mb  of  storage  and 
100  Mb  of  disk.  The  stations  are  connected  using  a  small  local  network  available 
from  Peritek.  The  speed  of  the  network  is  806  Kb  per  second.  The  allowed  dis¬ 
tance  between  the  nodes  limit  stations  to  be  within  the  same  building.  In  the 
future  we  would  expect  a  more  sophisticated  network  which  can  connect  sta¬ 
tions  within  a  complex  of  buildings.  We  assume  that  one  node  in  the  network  is 
special  and  acts  as  a  control  node  for  our  system.  Stations  work  independently 
but  the  control  node  retains  some  global  management  of  operations.  In  this 
way.  our  system  is  not  truly  distributed.  However,  wc  found  the  control  node  to 
be  very  useful  for  control  of  operations  and  back  up  and  recovery. 

Our  prototype  system  is  operational.  Some  of  its  parts  are  in  use  in  many 
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sifces.  Specifically,  MRS  has  been  acquired  already  by  50  organizations  world 
wide.  The  form  facility  OFS  has  been  acquired  by  20  organizations.  The  office 
procedure  facility  TLA  has  been  completely  implemented. 

In  this  paper  we  have  also  analyzed  the  flow  of  messages  in  an  office.  The 
analysis  depends  on  a  number  of  reasonable  but  nevertheless  limiting  assump¬ 
tions.  First,  that  the  messages  are  represented  by  forms.  Second,  that  the  sta¬ 
tions  are  consistent.  Third,  that  the  routing  and  form  modification  on  the  sta¬ 
tions  are  not  completely  general.  Finally,  we  should  point  out  that  the  form 
flows  analyzed  in  this  paper  have  a  direct  counterpart  to  the  form  flows  specified 
in  our  prototype  system.  That  is,  we  are  not  analyzing  hypothetical  situations 
but  real  problems  as  they  come  up  from  the  implementation  of  office  procedure 
specification.  In  fact,  our  analysis  is  less  general  than  the  facilities  provided  by 
our  form  procedures. 

Our  analysis  assumed  that  the  stations  are  deterministic.  If  we  want  to 
treat  nondeterministic  behaviour  we  can  define  routing  predicates  which  are  not 
mutually  exclusive.  We  can  go  through  a  similar  algorithm  as  presented  in  this 
paper.  However,  classes  will  have  a  tree  and  not  a  path  of  stations.  The 
interpretation  of  the  tree  is  that  a  form  in  a  class  can  potentially  go  through  any 
path  of  the  tree.  Loops  in  such  an  environment  will  represent  potential  loops. 
The  analysis  becomes  more  involved. 

In  this  paper  we  did  not  treat  the  coordination  problem  between  forms.  We 
assumed  that  all  forms  progress  through  the  stations  independent  of  each  other. 
The  coordination  of  forms  will  be  a  nice  extension  of  the  model.  In  addition,  we 
did  not  discuss  the  queueing  properties  of  the  stations  and  how  forms  proceed  in 
the  network  of  stations.  The  queueing  analysis  of  such  network  stations  is 
another  very  important  problem. 


To  summarize,  in  this  paper  we  have  presented  ein  approach  to  Office 
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Information  Systems  based  on  forms.  The  emphasis  on  the  paper  is  on  a  struc¬ 
tured  mode  of  communication  and  storage  of  information.  Our  basic  premise  is 
that  such  an  approach  can  be  useful  for  some  aspects  of  mechanization  and 
automation  of  office  work.  There  is  definately  much  room  for  other  approaches 
in  an  office  environment.  The  purpose  of  this  paper  is  not  to  conclude  that 
forms  is  the  only  way,  or  even  the  major  way,  of  dealing  with  office  problems. 
The  purpose  of  the  paper  is  to  outline  some  research  directions  which  are 
important,  interesting  and  feasible  in  the  area  of  Office  Information  Systems. 
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Abstract 

This  paper  outlines  an  effort  to  introduce  automation  into  an  office  forms  system 
(OFS).  OFS  allows  its  users  to  perform  a  set  of  operations  on  electronic  forms.  Actions 
are  triggered  automatically  when  forms  or  combinations  of  forms  arrive  at  particular 
nodes  in  the  network  of  stations.  The  actions  deal  with  operations  on  forms.  This 
paper  discusses  the  facilities  provided  for  the  specification  of  form-oriented  automatic 
procedures  and  sketches  their  implementation. 
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Form  Procedures 

1.  Introduction 

OFS  is  an  elecLronic  forms  managemeiiL  system  [Tsichritzis  1980,  1931,  Cheung 
1979,  1980,  Cibba  1979,  1980].  7t.  provides  an  interface  to  MRS,  a  small  relational  data¬ 
base  S3f^stem  [Hudyrna  1978,  Kornatowski  1979,  Ladd  1979].  OFS  and  MRS  were  written 
in  C  within  the  UNIX  operating  system  [Kernighan  197B,  Ritchie  and  Thomson  1978]. 
They  have  both  been  distributed  widely  to  organizations. 

An  OFS  system  consists  of  a  set  of  stations  distributed  over  a  number  of  machines 
in  a  network.  Each  user  has  a  private  set  of  forms  residing  in  his  station.  A  user  may 
only  manipulate  those  forms  which  he  temporarily  "owns"  in  the  sense  that  they  are 
part  of  his  database.  Communication  and  interaction  between  stations  is  achieved  by 
allowing  users  to  mail  forms  to  one  another. 

A  distinction  is  made  in  OFS  between  form  types,  form  blanks  and  form  instances. 
A  form  blank  is  simply  the  form  template  used  to  display  a  form  instance.  A  form 
instance  corresponds  to  an  actual  filled  form  represented  as  a  tuple  in  the  database  of 
forms.  Its  fields  may  have  values  assigned:  to  it.  and  it  always  has  a  unique  key 
assigned  at  creation  time  by  the  system.  A  form  type  is  the  specification  of  a  form 
blank  and  a  set  of  field  types  (see  below).  A  form  file  is  a  relation  used  to  store  all 
forms  of  the  same  type  belonging  to  a  station.  The  collection  of  form  files  for  a.  ste-tion 
is  a  form  database.  Figure  1  shows  a  form  blank  and  form  instance  for  the  form  type 
called  order.  Note  that  some  fields  of  the  form  instance  need  not  have  values  associ¬ 
ated  with  them.  The  ke)^  field  must  have  a  value  which  is  automatically  assigned  by  the 
system. 

Form  fields  may  be  of  six  different  types.  Manual  fields  of  type  I  may  be  inserted 
or  modified  at  any  time,  type  2  may  be  inserted  at  any  time  but  not  modified,  and  type 
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ORDER  FORM 

KEY: 

Customer  number: 

Cns’tnrner-  nariie: 

Item: 

Descmiption: 

Price:  _ _ _ 

Qii0r)tit>': 

Total: 

An  order  form  blank 


ORDER  FORM 

KEY:  00001.00000 

Customer  number:  354 

Customer  name:  CSRC 

Item:  254 
Price:  200.00 
Quantity:  2 

Total: 

Description:  Office  Forms  System _ 

An  order  form  instance 


Figure  1  Vorm  blanks  and  instances 

3  must  be  inserted  at  form  creation  and  never  rnodiGed.  Automatic  fields  of  type  1  are 
key  fields,  always  the  first  field  of  a  form,  type  2  date  fields,  and  type  3  signature  fields 
bearing  the  station’s  name  if  the  preceding  field  is  filled  in. 

Form  operations  are  creation,  selection,  and  modification.  Forms  may  also  be 
attached  to  dossiers.  Dossiers  are  lists  of  forms  which  are  not  necessarily  of  the  same 
form  type,  but  which  have  something  in  common  that  the  user  wishes  to  capture. 

Forms  may  not  be  destroyed,  although  they  may  be  mailed  to  a  "wastebasket  sta¬ 
tion"  which  conceptually  shreds  the  electronic  form.  The  wastebasket  station  may  in 
fact  archive  rather  than  erase  a  form  depending  upon  the  needs  of  a  particular  appli- 
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cation,  Form,  instances  are  unique,  and  must  always  exist  at  exactly  one  location  in  the 
system.  The}'-  are  either  in  some  form  file  or  waiting  in  a  mail  tray.  Forms  may  be 
mailed  from  one  station  to  another.  They  must  wait  in  a  mail  tray  and  be  explicitly 
retrieved  in  order  to  be  placed  in  the  receiving  station’s  form  file.  Copies  may  be  made 
of  forms,  but  they  arc  assigned  a  unique  key  consisting  of  the  key  of  the  original  form 
together  with  a  system-generated  copy  number  distinguishing  it  from  the  original. 


Form  files  may  be  accessed  as  a  whole  using  an  IvtRS  interface.  However,  in  this 
case  no  protection  is  provided  against  illegal  operations  such  as  destroying  a  form  or 
creating  a  form  with  a  key  that  is  already  in  use.  Therefore,  the  MRS  interface  is  not 
meant  to  be  used  except  by  privileged  users. 


OFS  is  basically  a  passive  system,  i.e.,  the  user  has  to  initiate  every  action.  The 
only  automatic  form  processing  that  01''S  will  do  occurs  if  a  form  is  mailed  to  a  special 
automatic  station.  Such  a  station  periodically  reads  its  mail  and  submits  the  forms  as 
input  to  an  application  program.  Tliese  programs  iTiUsl  be  written  so  as  to  preserve 
compatibility  with  OFS.  Consequently,  the  specification  of  an  OFS  automatic  procedure 
requires  a  great  deal  of  knowledge  of  the  inner  workings  of  OFS.  The  TLA  project  was 
conceived  as  a  tool  to  introduce  automatic  form  processing  into  OFS  [Hogg  1981,  Nier- 
strasz  1981]. 


A  set  of  features  was  chosen  to  study  the  design  and  implementation  issues  of  a 
reasonably  useful  but  unembeilisbed  automatic  forms  system.  A  number  of  assump¬ 
tions  were  made  about  the  meaning  of  a  "forms  procedure",  especially  within  the  con¬ 
text  of  OFS. 

The  user  interface  is  presented  in  terms  of  objects  with  which  the  OFS  user  is 
already  familiar.  Specifying  operations  within  a  procedure  corresponds  closely  to  per¬ 
forming  those  operations  within  a  manual  system.  A  user  '.vho  is  editing  an  automatic 
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forms  procedure  manipulates  "sketches"  of  forms.  Sketches  are  form-like  objects  that 
represent  the  forms  that  the  procedure  >vill  eventually  manipulate.  The  same  form 
template  which  OFS  uses  to  display  form  instances  is  used  quite  differently  in  TLA  to 
describe  preconditions  and  actions  in  office  procedures.  The  specifications  are  non¬ 
procedural  and  have  a  simple  syntax. 

TLA  does  not  assume  any  knowledge  of  the  system  state  other  than  what  is  avail- 
able  to  the  user  in  his  form  file  or  his  mail  tray.  This  corresponds  to  the  notion  in  OFS 
that  users  can  only  manipulate  the  forms  that  they  "own".  Anything  happening  outside 
a  user’s  own,  workstation  does  not  concern  him.  The  domain  of  automation  is  that  of 
the  indmdual  workstation.  The  complexity  of  determining  when  to  trigger  a  procedure 
is  thereby  considerably  reduced. 


An  automatic  procedure  is  meant  to  capture  the  notion  of  an  office  worker  collect¬ 
ing  forms  at  his  or  her  desk  until  a  '‘complete  set"  is  compiled.  Ke  can  then  process 
the  forms  and  file  them  or  send  them  on  their  way.  on  their  way.  Processing  of  the 
collection  of  forms  ma)'-  cause  forms  to  be  modified  or  new  forms  to  be  added  to  the 
set.  Reference  tables  and  calculating  tools  are  made  available  through  an  interface  to 
a  local  library  of  application  programs. 


The  other  aspect  of  automation  supplied  by  TLA  is  that  of  "smart  forms"  which 
automatically  fill  in  certain  fields  using  previously  filled-in  fields  as  arguments.  The 
domain  here  is  that  of  the  form  alone,  so  triggering  takes  place  whenever  a  form  is 
created  or  modified. 


There  are  two  types  of  automatic  fields.  The  first  type  is  filled  in  only  if  all  its 
argument  fields  have  values.  The  other  type  accepts  null  values,  and  is  filled  in  even  if 
some  argument  fileds  are  missing.  Fields  are  initiail)'’  filled  in  sequence.  When  an 
automatic  field  is  reached,  an  application  program  v/ritten  in  a  conventional  program- 
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ming  language  (usually  C  or  the  UNIX  Shell)  is  executed.  The  output  from  thin  program 
is  assigned  to  that  field.  If  any  argument  fields  are  subsequently  modified,  the 
automatic  fields  'which  use  it  are  also  updated.  Typical  applications  are  arithmetic 
operations  such  as  sales  tax  calculations,  or  database  queries  such  as  filling  in  a 
customer's  address. 

"Smarter  forms"  'with  fields  that  change  value  depending  upon  time  conditions,  the 
state  of  the  system,  or  any  other  variable,  "vvere  not  implemented.  Some  "smarter 
form"  problems  can  be  solved  with  TLA's  automatic  procedures. 

Automatic  procedures  have  preconditions  and  actions,  but  no  postconditions  in 
the  usual  sense.  Satisfying  ail  preconditions  guarantees  the  s'uceessful  completion  of 
all  actions.  There  is  only  a  very  limited  sense  in  v/hich  a  procedure  may  "fail".  For 
example,  it  may  never  be  triggered  because  niissiug  forms  do  not  arrive.  Postcondi¬ 
tions  may  be  interpreted  in  terms  of  the  preconditions  of  another  automatic  procedure 
to  -which  control  of  the  forms  is  passed. 

Automatic  procedures  run  ronctirrently  -with  the  maniial  fxmctions  of  the  users. 
Conflicts  can  arise  over  the  form  manipulations.  Forms  being  collected  by  an 
automatic  procedure  could  be  modified  or  shipped  a-way  manuaiiy.  They  can  even  be 
"stolen"  by  another  competing  automatic  procedure.  This  implies  that  -when  a  com¬ 
plete  set  of  forms  is  gathered  for  some  procedure,  it  has  to  be  temporarily  "removed" 
from  the  system.  This  operation  safeguards  the  forms  until  they  are  processed. 

2.  Interface 

The  specification  of  an  automatic  procedure  in  TT.A  beans  some  resemblence  to 
SBA  and  ODE  [De  Jong  1980,  Zloof  1960].  The  precondition  segment  of  a  procedure 
bears  a  resemblance  to  a  QBE  query  with  forms  instead  of  tables  as  the  data  objects. 
In  the  simplest  form  of  a  TLA  precondition,  putting  a  value  in  a  held  of  a  precondition 
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indicates  that  a  form  is  to  be  found  with  a  held  matching  that  value.  The  action  seg¬ 
ment  of  the  procedure  is  similar.  The  simplest  operation  is  to  assign  to  a  field  the 
value  speeifieri  in  an  action. 

The  order  in  which  forms  needed  by  a  procedure  arrive  is  not  important.  The 
order  in  which  actions  are  performed  is  not  specified  in  detail.  TLA  merely  ensures 
that  the  procedure  be  logically  consistent.  The  specification  is  non-procedural.  The 
user  indicates  what  forms  are  to  he  cnllectpii,  and  what  is  to  be  done  with  them.  He 
does  not  specify  how  they  are  to  be  collected  or  how  the  actions  are  to  be  performed. 

Preconditions  in  TLA  describe  what,  when  and  where.  For  each  procedure  there  is 
a  UDorkvrig  seL  of  forms.  The  working  set  may  Include  forms  that  come  only  from  cer¬ 
tain  workstations,  forms  local  to  the  station  specif3nng  the  procedure,  or  forms  that 
have  just  been  processed  by  another  automatic  procedure.  One  may  also  specify  a  pro¬ 
cedure  to  run  only  at  certain  times  or  ranges  of  times. 

A  TT.A  procedure  is  a  collection  of  "sketches".  A  sketch  resembles  a  form,  but  is  to 
be  distinguished  from  form  blanks,  form  types  or  form  instances.  A  pre condition 
sketch  indicates  a  request  to  the  system  to  find  "a  form  that  looks  like  this".  An  action 
sketch  indicates  a  request  to  modify  a  form  that  has  already  been  obtained.  In  either 
case  a  sketch  describes  a  form  instance  before  or  after  processing  by  the  procedure. 
The  medium  of  specification  of  a  sketch  is  the  same  form  blank  which  is  the  template 
for  the  form  instance  being  described.  Actions  and  preconditions  which  do  not  refer  to 
information  found  on  a  form  are  specified  by  pseudo-sketches  of  "pseudo-forms".  For 
example,  the  condition  that  a  procedure  process  only  forms  coming  from  user  "John" 
must  be  indicated  on  a  special  source  pseudo-sketch. 

Sketches  are  used  to  capture  the  restrictions  referring  to  values  that  appear  on 
the  face  of  the  forms  in  the  working  set.  Local  restrictions  are  constant  field  values. 
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sets  or  ranges  of  values,  and  relations  between  values  of  the  fields  on  a  given  form.  The 
local  restrictions  refer  only  to  the  values  appearing  on  a  single  form  in  the  working  set. 
TLA  tries  to  determine  whether  a  given  form  satisfies  the  local  restrictions  (including 
the  source  condition)  for  some  sketch  in  some  automatic  procedure.  If  it  does,  TLA 
notes  that  information  and  attempts  to  match  that  form  with  other  forms  to  obtain  a 
complete  working  set  for  that  procedure. 

Figure  2  is  an  example  of  a  precondition  sketch  instructing  TLA  to  watch  for  order 
forms  requesting  "Tm  tear-drops".  Since  this  information  can  be  found  right  on  the 
order  form,  it  is  a  local  precondition.  A  sample  procedure  including  such  a  sketch 
might  perform  the  single  action  of  returning  a  form  that  says  "We  stopped  making 
those  things  years  ago!". 


ORDER  FORM  KEY: _ 

Customer  number: _  Customer  name: _ 

Item: _  Description:  Three  Letter  Acronym 

Price: _ 

Quantity; _ 

Total: _ 


Figure  2  A  precondition  sketch 

Global  restrictions  on  the  working  set  of  an  automatic  procedure  are  the  join  con¬ 
ditions  between  values  of  fields  appearing  on  different  forms.  One  expects  all  the  forms 
in  a  procedure’s  working  set  to  be  linked  by  certain  common  field  values.  Matchmg 
field  values  are  therefore  probably  adequate  to  model  many  applications  of  automatic 
procedures.  However,  simple  inequality  restrictions  may  also  be  specified. 


-]  10- 


Figure  3  sho'^’s  ho'W’^  a  link  i?  made  to  find  an  iuv  form  for  the  item  requested  on  an 
order  form.  Each  sketch  in  a  procedure  has  a  name  assigned  by  the  user.  This  name  is 
prepended  to  the  field  name,  in  this  way  a  field  of  a  different  sketch  can  be  referenced 
within  a  sketch.  Note  that  one  could  equivalently  have  placed  the  restriction 
"=inv.item’'  in  the  item  number  field  of  the  order  precondition  sketch. 


INVENTORY  RECORD 

- - - - - — - - - - - — 

KEY; 

Item;  =ord.item 

Description; 

Price;  . 

Quantity  in  stock; 

Figure  3  .4  global  (joi^.)  pT^coTidition 

We  can  also  restrict  the  source  of  mail  being  processed  bj?"  an  automatic  pro¬ 
cedure.  Suppose,  for  example,  that  the  accounting  department  receives  an  order  form 
from  the  ordering  department.  This  may  be  interpreted  as  a  request  to  forward  a 
customer’s  address  to  the  warehouse  so  that  the  order  may  be  filled.  If,  however,  the 
order  form  arrives  from  the  warehouse,  that  may  indicate  that  the  order  has  gone 
through,  and  that  an  invoice  should  be  mailed  out.  Figure  4  shows  an  origin  pseudo¬ 
form  sketch  for  such  an  application.  Forms  may  thus  be  processed  differently  depend¬ 
ing  upon  their  point  of  origin.  Alternatively,  the  special  field  not  may  be  filled  in  to 
indicate  that  only  forms  coming  from  stations  not  listed  in  the  pseudo-sketch  should 
be  processed  by  the  procedure.  The  pseudo-station  me  is  also  available  to  indicate 
that  forms  must  (or  must  not)  come  from  within  the  station’s  own  files. 

All  form  modification  actions  are  indicated  on  action  sketches.  Every  form  mani¬ 
pulated  by  a  forms  procedure  has  a  precondition  sketch  and  an  action  sketch.  Actions 
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J^igure  4  An  origin  pseudo-sketch 

•which  do  not  concern  themselves  with  field  values  must  be  expressed  via  pseudo-forms. 


The  action  form  sketch  indicates  all  insertions  and  updates  to  the  form.  The 
values  to  be  inserted  may  be  constant  values,  eg.,  an  authorization,  copied  field  values, 
or  possibly  function  calls  to  application  programs.  We  distinguish,  therefore,  between 
the  original  and  the  updated  value  of  any  field.  A  field  which  must  be  copied  to  another 
form  may  itself  be  modified,  and  the  wrong  value  must  not  be  used.  Furthermore,  the 
function  calls,  may  access  both  the  original  and  updated  values  of  fields.  In  fact,  the 
original  value  of  a  field  -vvill  often  be  one  of  the  arguments  to  a  function  call  update  to 
that  field. 


The  action  sketch  of  figure  5  illustrates  several  features.  The  price  of  an  item  is 
filled  in  by  copying  it  from  an  inv  form.  A  program  called  ’'mult"  is  called  to  calculate 
the  total.  Finally,  the  original  value  of  quantity  is  accessed  whereas  the  updated  value 
of  price  is  used.  Note  LhaL  Lhe  syiiibois  and  are  used  to  respectively  access 


functions,  original  and  updated  field  values.  If  none  of  these  symbols  are  used,  a  con¬ 
stant  string  value  is  inserted. 
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ORDER  FORM 

KEY: 

Customer  number; 

Cnqtompr  namp! 

Item: 

Description: 

Price: 

?inv.  price 

Quantity: 

Total: 

#rnult  Iprice  ?quantity 

Figure  5  An  action  sketch 

Some  analysis  is  needed  to  ensure  that  every  updated  field  ultimately  depends 
only  upon  values  originally  available  on  the  working  set  of  forms.  It  is  clearly  incorrect 
to  update  each  of  two  fields  by  copying  over  the  updated  value  of  the  other.  Suppose 
that  the  price  field  of  the  order  form  were  updated  to  "iinv.price’'  and  the  price  field  of 
the  inventory  form  were  updated  to  "lorder. price”.  No  order  of  execution  could  make 
sense  of  the  request. 

Field  constraints  must  be  obeyed.  Procedures  that  create  forms  must  fill  in  cer¬ 
tain  fields.  Procedures  that  modify  forms  must  only  modify  fields  with  an  appropriate 
type.  Implied  actions  must  also  be  evaluated  if  a  procedure  modifies  or  inserts  a  field 
which  is  an  argument  to  an  automatic  field. 

After  all  form  modifications  are  completed,  zero  or  more  copies  of  each  form  are 
made.  Each  form  or  copy  may  then  be  left  in  the  user’s  files,  inserted  into  a  dossier  or 
shipped  to  another  station.  The  mechanism  used  to  specify  these  operations  is  the  des¬ 
tination  pseudo-sketch;  an  example  is  shown  as  figure  6.  Copy  0  is  the  form  manipu¬ 
lated  by  a  procedure,  and  one  additional  destination  pseudo-sketch  is  filled  in  for  each 
copy  of  that  form.  The  operations  available  are  leave,  ship  and  dossier.  The  first  of 
these  requires  no  where  argumenl,  but  the  others  r  equire  the  name  of  a  station  or  a 
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dossier  respectively.  This  may  be  given  as  a  simple  constant  or  a  field  or  function 
value,  just  as  in  acLioii  sketches. 


DESTINATION  PSEUDO-SKETCH  COPY:  0 _ 

Operation:  ship _ 

Where:  accounting _ 


Figure  G  Destination  pseudo-sketch 

A  weak  sort  of  postcondition  is  available  by  employing  a  function  call  to  decide  the 
operation,  dossier  name  or  shipping  destination.  General  postconditions  can  only  be 
acheived  by  cooperating  form  procedures  which  accept  different  cases  of  the  working 
set  of  forms.  Suppose,  for  example,  that  the  processing  of  an  order  onuses  the  quan¬ 
tity  of  an  item  in  stock  to  dip  below  a  certain  acceptable  level.  We  may  wish,  at  this 
point,  to  send  a  memo  to  the  manager  initiating  an  increase  in  the  production  of  the 
item.  The  procedure  which  processes  orders  is  incapable  of  conditionally  producing 
this  memo  as  a  postcondition  to  inventory  update.  It  could  unconditionally  produce 
such  a  memo  and  then  functionally  decide  to  mail  it  either  to  the  manager  or  to  a  gar¬ 
bage  collection  station.  A  cleaner  approach,  though,  is  to  have  a  separate  procedure 
ivhich  searches  for  low  inventory  items,  and  then  sends  the  memo. 

With  this  approach  individual  tasks  are  clearly  identified.  Automatic  procedures 
are  simple  and  completely  devoid  of  any  control  flow.  Furthermore,  the  implementa¬ 
tion  is  simpler  because  postconditions  correspond  to  separate  piocedures.  The  lew 
inventory  checker,  for  example,  is  only  invoked  when  an  inventory  form  is  updated. 
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3  TmmpleTnent.ntion 


An  automatic  forms  procedure  in  TLA  is  specified  by  a  collection  of  sketches,  and 
as  such  describes  ivhat  is  to  be  done  rather  than  how  to  do  it.  The  sketch  representa¬ 
tion  is  very  convenient  for  the  user.  This  format,  however,  is  wholly  unsuitable  for 
implementation.  The  specification  must  be  analysed  and  translated  for  greater  run¬ 
time  efficiency. 


We  cannot  predict  when  the  forms  required  to  trigger  a  forms  procedure  may 
arrive.  The  processing  must,  therefore,  of  necessity  be  broken  into  distinct  parts.  The 
specification  in  terms  of  sketches  contains  information  of  four  basic  kinds:  local  (form) 
constraints,  global  (working  set)  constraints,  duplicate  form  types  (so  that  one  form  is 
not  used  to  match  two  sketches  within  a  single  working  dossier),  and  actions.  The  exe¬ 
cution  of  a  forms  procedure  makes  use  of  these  four  specifications  at  different  stages. 
It  is  convenient  to  process  these  specifications  at  procedure  definition  time,  raid 
translate  them  into  formats  that  require  no  further  run-time  analysis. 


Suppose  that  TLA  is  notified  of  the  availability  of  a  form  for  automatic  processing. 
It  first  checks  vvhether  the  form  matches  the  local  conditions  of  any  precondition 
sketch  for  that  form  type.  The  local  conditions  are  com.prised  of  the  source  restriction 
and  the  field  constraints.  If  a  form  does  not  match  the  local  constraints  of  any  precon¬ 
dition  sketch,  then  TLA  assumes  that  no  procedure  is  prepared  to  handle  it.  Suppose 
that  a  form  does  match  the  local  constraints  of  one  or  more  precondition  sketches. 
That  form  is  then  a  candidate  for  a  working  set  for  .some  procedure(s).  It  is  immaterial 
whether  or  not  a  working  set  including  that  form  is  complete.  There  is  ahvays  the  pos¬ 
sibility  that  at  some  time  the  missing  forms  of  the  working  set  could  arrive. 

The  form  instance  in  figure  7  matches  the  local  condition  of  the  precondition 
sketch,  quantityXJ.  There  may  not  necessarily  be  a  global  match  if  there  is  no 
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order  form  with  the  same  item  number.  Even  if  there  is  an  order  form  with  the  same 
item  number,  it  may  not  satisfy  the  other  constraints  of  its  precondition  sketch. 
Nevertheless,  TLA  notes  that  a  local  match  has  been  made  and  waits  for  the  rest  of  the 
working  set  to  arrive. 


INVENTORY  RECORD  KEY:  1 

j 

t 

t 

Item:  =ord.item _  Description: _ j 

Price: _  j 

Quantity  in  stock;  >0 _ 


Precondition  sketch 


INVENTORY  RECORD 

KEY:  OOOOi. 00000 

Item:  4-nf5 

Descriptinn;  Wnrk.slation 

Price:  16000.00 

Quantity  in  stock:  12 

Form  instance  matching  local  preconditions 


Figure  7  Local  matching 

TLA  checks  the  local  constraints  of  a  form,  records  its  findings,  usually  determines 
that  the  form  does  not  complete  a  working  set,  and  then  waits  for  more  forms  to 
arrive.  Further  processing  may  not  occur  for  .some  time.  All  local  constraints  for 
forms  of  the  same  type  are  extracted  from  aii  procedures  and  stored  in  a  common  file. 
This  file  is  opened  to  check  the  local  constraints  of  a  given  form  for  all  procedures. 

iMter  the  local  constrednts  have  been  matched  for  a  form,  TLA  checks  link  condi¬ 
tions  between  the  corresponding  sketches  of  the  procedure.  The  link  conditions  are 
stored  in  files  by  procedure.  Suppose  that,  in  the  previous  example,  TLA  fou.nd  an 
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order  for  item  0002.  It  would  note  . that  the  link  between  the  inventory  and  order  form 
precondition  sketches  were  satisfied  by  these  two  form  instances.  If  the  working  set 
consisted  of  only  these  two  forms,  then  the  procedure  actions  would  be  performed. 
Otherwise,  TLA  will  wait  until  forms  are  foimd  to  match  the  remaining  links  of  the  pro¬ 
cedure. 

Even  if  forms  arrive  together,  the  processing  of  the  forms  is  sequential.  TLA  treats 
each  form  individually.  A  locking  algorithm  guarantees  that  two  forms  cannot  be  pro¬ 
cessed  at  once  at  a  given  workstation.  Generally  forms  will  not  arrive  simultaneously. 
One  can  expect  a  considerable  delay  between  the  establishment  of  local  constraints 
and  the  evaluation  of  links  between  forma. 

Actions  are  performed  only  once  a  working  set  of  forms  has  been  compiled. 
Actions  are  stored  in  a  separate  file.  TLA  preproeesses  procedures  to  check  the  legal¬ 
ity  of  actions  and  to  determine  a  legal  order  of  execution  if  one  exist.  No  further  run¬ 
time  analysis  is  performed.  Actions  run  to  completion. 

The  example  in  figure  8  implicitly  requires  that  price  must  first  be  copied  from 
the  inventory  form  before  its  value  may  be  multiplied  by  the  quantity.  This  establishes 
a  legal  order  of  actions  for  that  sketch. 

An  admittedly  unlikely  case  is  captured  in  figure  9  which  is  triggered  if  TLA 
delects  two  inventory  forms  for  a  single  item.  Since  there  are  two  precondition 
sketches  in  the  procedure,  TLA  assumes  that  they  refer  to  two  different  forms  in  the 
working  set.  Otherwise,  any  inventory  form  would  trivially  satisfy  both  precondition 
sketches  and  thus  trigger  the  procedure.  When  the  procedure  is  written,  TLA  notes 
immediately  that  two  precondition  sketches  describe  forms  of  the  same  type.  It  per¬ 
forms  a  key  comparison  of  those  forms  in  any  working  set  identified  to  guarantee  that 


they  are  not  one  and  the  same. 
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ORDER  FORIvI 

is.ih  1  _ _ 

1 

! 

_  ! 

Customer  number: 

Customer  name: 

Item: 

Description: 

Price: 

?inv.  price 

1 

Oue.ntit 

! 

i 

Total: 

fmult  Iprice  Tquantity 

1 

_ f 

Figure  8  Ordering  of  actions 


INVENTORY  RECORD  KFY: 

Item: _  Description:  _ _ 

Price: _ ^ _ 

Quantity  in  stock: _ 


Precondition  .sketch  invl 

INVENTORY  RECORD 

KEY: 

.. 

Item:  =:inv1.itRm 

Descrintion; 

Price: 

Quantity  in  stock: 

1 

Precondition  sketch  inv? 


Figure  9  Duplicate  for^  types  in  a  procedure 

The  TLA  automatic  procedure  interpreter  is  triggered  upon  receipt  of  mail,  form 
creation  and  form  modification.  Since  the  last  two  are  the  responsibility  of  the  user, 
triggering  in  these  cases  inv^olves  only  the  spawning  of  a  new  interpreting  process.  In 
the  first  case,  however,  the  interpreting  process  is  initiated  by  the  user  who  sent  the 
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mail. 


Automatic  procedures  are  meant  to  run  regardless  of  whether  the  user  to  whom 
the  corresponding  station  belongs  ever  signs  on  after  the  procedure  is  tvritten.  Mail  in 
the  system  is  routed  through  a  host  control  node.  The  sending  station  sends  a  message 
to  the  host  consisting  of  the  contents  of  the  form  tuple  and  the  name  of  the  station 
which  is  to  receive  the  mail.  The  host  then  stores  the  form,  updates  the  receiving 
station’s  mail  tray  and  sends  a  message  to  the  recipient’s  station.  At  the  recipient’s 
station  machine,  the  interpreting  process  is  started.  It  communicates  with  the  host, 
asking  for  images  of  each  new  form  in  the  recipient’s  mail-tray.  The  interpreter  main¬ 
tains  files  of  form  images  for  each  form  available  for  automatic  processing.  It  deletes 
the  images  when  the  forms  have  been  processed  cither  automatically  or  by  the  user. 
The  images  are  copies  of  the  conlcnLs  of  each  form  for  use  by  the  interpreter  alone, 
and  are  stored  just  as  forms  are  stored.  The  user,  however,  ha.s  no  access  to  the 
images  as  forms.  They  may  not  be  modified,  shipped  away,  or  otherwise  manipulated. 
They  are  not  properly  forms  or  copies  of  forms,  but  merely  im.ages  of  forms. 


Mail  may  arrive  while  the  interpreter  is  running.  It  therefore  continues  to  process 
all  mail  until  it  discovers  an  empty  tray  in  a  manner  similar  to  that  of  the  line  printer 
daemon  m  UNIX.  Only  one  interpreter  may  run  at  any  time  for  a  given  station.  In  this 
way  vre  eliminate  interference  problems  between  interpreters.  A  lock  is  placed  on  the 
running  of  the  interpreter  for  a  given  station. 


4.  Sketch  and  Instance  Graphs 


The  working  set  of  a  foi  fu  procedure  is  abstracted  in  terms  of  a  sketch  graph  with 
the  sketches  as  coloured  vertices,  and  the  matching  conditions  as  edges  in  the  graph. 
The  form  gathering  algorithm  must  find  corresponding  forms  and  satisfy  matching  con¬ 
ditions  of  the  sketch  graph.  An  instance  graph  is  generated  associated  with  the  forms 
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retrieved.  The  interpreter  tries  to  match  the  sketch  graph  in  the  instance  graph. 

Consider  the  precondition  sketches  in  figure  10.  A  link  between  the  account  and 
order  forms  is  established  across  the  customer  number.  A  link  between  the  order  and 
inventory  forms  is  captured  by  two  global  conditions,  one  by  item  number  and  the 
other  by  quantity. 


CUSTOMER  ACCOUNT  KEY: 

Customer  number:  =order. number 

Credit  rating: _ 

Balance: _ 


ORDER  FORM 

Customer  number: _  Customer  name: 

Item: _  Description; 

Price: _ 

Quantity;  <= in v. quantity 

Total: _ _ _ _ 


KEY: 


INVENTORY  RECORD  KEY: 

Item:  =order.iteni _ Description: _ 

Price: _ 

Quantity  in  stock: _ 


Figure  10  Procondition  sketches  of  a  procodure 


The  corresponding  sketch  graph  is  shown  in  figui  e  11.  Each  sketch  is  represented 
by  a  labelled/coloured  node.  Each  collection  of  global  conditions  between  a  pair  of 
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sketches  is  represented  by  a  single  edge. 

When  a  form  is  passed  to  the  interpreter,  it  first  reads  the  file  of  local  constraints 
for  the  forms  of  that  type.  Whenever  a  match  is  found,  the  interpreter  notes  which 
sketch  of  which  procedure  is  matched  by  the  form,  and  it  enters  a  tuple  consisting  of 
the  form  type,  the  form  key,  the  procedure  and  the  sketch  matched  into  a  relation 
(called  "NODE"). 

The  file  of  globed  constrednts  for  the  procedure  matched  is  then  read.  For  every 
link  concerning  the  matched  sketch.  TLA  establishes  whether  the  current  form  satisfies 
the  join  conditions  with  any  of  the  forms  previously  recorded  in  the  NODE  relation.  For 
every  new'  link  found,  TLA  inserts  a  tuple  into  another  relation  called  EDGE.  EDGE 
records  the  form  keys,  types,  sketch  names  and  procedure  name  of  every  link  esta¬ 
blished. 


account 

order 

inventory 

* - 

- * - 

- * 

Figure  11  sketch  graph  for  a  single  procedure 

The  NODE  and  EDGE  relations  describe  an  instance  graph  with  forms  as  vertices  or 
nodes  and  links  between  them  as  edges.  The  vertice.s  are  coloured  according  to  w'hich 
sketch  the  form  matches.  If  a  form  miitches  two  or  more  distinct  sketches  in  one  or 
more  procedures,  it  is  multiply  represented,  once  for  each  sketch.  Procedure  names 
partition  the  instance  graph,  since  there  can  be  no  links  between  sketches  of  different 
procedur  es.  For  each  partition  Ave  wish  to  match  the  sketch  graph  that  describes  the 
working  set  of  forms  for  that  procedure.  Nodes  are  assigned  a  unique  colour  for  each 
sketch,  and  the  corresponding  colours  are  used  in  the  instance  graph.  An  instance  of 
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the  sketch  graph,  then,  must  be  found  within  the  instance  graph. 

Figure  12  shows  the  instance  graph  for  the  procedures  of  figure  9.  Forms  have 
been  found  to  match  each  of  the  precondition  sketches  of  the  procedure,  but  there  is 
no  complete  working  set.  When  a  working  set  is  found,  it  is  processed  and  it  disappears 
from  the  instance  graph.  Note  that  most  of  the  disconnected  subgraphs  of  the 
instance  graph  are  in  fact  subgraphs  of  the  sketch  graph.  In  the  last  case,  however, 
there  are  two  orders  for  a  single  item,  and  the  relationship  is  nnt  that  sirnple.  The  drat 
account  form  to  complete  either  w'orking  set  will  complete  the  "copy”  of  the  sketch 
graph  to  be  found  in  the  instance  graph. 

account 

* 


inventory 

* 


Figure  12  The  instance  graph  for  a ’procedure 

The  relationships  between  the  forms  in  the  working  set  of  a  form  procedure  are 
usually  best  expressed  in  terms  of  the  join  conditions.  The  sketch  graph  will  generally 
be  connected.  The  instance  graph,  however,  will  more  often  consist  of  several  partially 
complete  working  sets  of  forms,  and  so  wall  usually  be  disconnected. 

If  the  join  conditions  imposed  on  the  working  set  of  forms  are  "nice"  then  each 
connected  subgraph  of  the  instance  graph  will  also  be  a  subgraph  of  the  sketch  graph. 
It  is  conceivable,  however,  that  two  forms  satisfying  a  precondition  sketch  may  each 
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satisfy  a  join  condition  with  a  third  form  satisfying  a  second  sketch  in  the  same  pro¬ 
cedure.  This  anomaly  will  occur  if  the  imposed  join  conditions  are  "not  nice  enough". 
In  this  case,  the  connected  subgraphs  of  the  instance  graph  are  not  as  simply  related 
to  the  sketch  graph.  Thus,  establishing  when  a  complete  working  set  of  forms  has  been 
compiled  requires  careful  analysis. 

When  TLA  has  finished  processing  a  form,  we  know  that  the  insta.nce  graph  contains 
no  copies  of  the  sketch  graph.  If  a  copy  of  the  sketch  graph  is  identified,  then  a  work¬ 
ing  set  has  been  found,  the  procedure  is  executed,  and  the  corresponding  nodes  and 
edges  arc  purged  from  the  instance  graph.  No  more  v.-orking  sets  remain.  When  a  new 
form  arrives,  a  working  set  of  forms  may  be  completed  only  if  that  new  form  is 
included.  The  analysis  of  the  instance  graph,  then,  need  only  concern  the  connected 
subgraphs  which  include  nodes  representing  the  new  form. 

Join  conditions  giving  rise  to  sketch  frees  seem  natural,  since  the  "cheapest" 
description  of  the  relationships  between  sketches  would  contain  no  cycles.  If  A  is 
related  to  B  and  B  is  related  to  C,  then  one  would  hope  not  to  find  any  other  relation¬ 
ship  holding  between  A  and  C.  In  practice,  however,  things  may  not  be  that  simple. 
Join  conditions  might  give  rise  to  cycles,  or  even  disconnected  sketch  graphs.  Suppose 
that  the  warehouse,  for  example,  has  a  single  value  form  at  its  workstation  keeping 
track  of  the  total  dollar  value  of  its  stock.  The  procedures  which  update  it  would 
include  a  blank  precondition  sketch  for  a  value  form.  Since  there  is  no  confusion 
about  which  value  form  is  needed,  there  are  no  local  or  global  conditions  to  be 
specified  for  it.  The  corresponding  sketch  graph  in  figure  13  is  therefore  disconnected. 

5.  Graph -chasing 

The  algorithm  which  searches  the  instance  graph  for  a  copy  of  the  sketch  graph 
employs  a  list  oi  potential  working  sets.  Initially  there  exists  a  single  such  set  contain- 
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account  order  inventory  value 

* - ^ ^ 


Figure  13  ^4  disconnected  sketch  graph 

ing  only  the  key  of  the  newly  added  form.  Edges  are  traversed  in  the  instance  graph 
and  keys  are  added  lo  each  sel  ualil  all  the  edges  and  nodes  in  the  sketch  graph  have 
been  checked. 

We  start  at  the  node  of  the  sketch  graph  corresponding  to  the  new  form.  We 
traverse  edges  leading  out  from  that  node,  and  check  off  any  new  nodes  that  we  reach. 
We  may  follow  any  previously  untraversed  edges  leading  from  any  node  we  have  thus 
far  reached.  Edges  wiii  lead  back  to  old  nodes  wherever  cycles  occur.  If  the  sketch 
graph  is  disconnected,  then  the  subgraph  containing  the  first  node  will  be  traversed 
first.  Edges  not  in  that  subgraph  cannot  lead  from  old  nodes  until  an  edge  is  traversed 
which  checks  off  two  new  nodes. 

The  sketch  and  instance  graphs  in  figure  14  will  be  used  to  illustrate  the  graph- 
chasing  algorithm.  The  example  contains  both  cycles  and  disjoint  subgraphs. 

Sketches  3  and  5  are  sketches  for  the  same  form  type  but  represent  distinct 
forms  in  the  procedure.  The  terms  Ja,  b,  c,  ...pj  are  keys  belonging  to  forms  that 
match  the  local  conditions  of  the  sketch  graph.  Form  a,  for  example,  matches  sketch 
1.  Edges  in  the  instance  graph  represent  joins.  I’orms  c  and  f,  for  example,  satisfy  the 
global  conditions  between  sketches  2  and  3. 

The  addition  of  form  p  results  in  the  completion  of  the  working  set  {a,c,f,h,p) 
where  previously  no  complete  working  set  existed.  The  algorithm  presented  here  will 


identify  this  set  of  forms. 
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Sketch  graph  (type(3)  =  type(5)) 


a 

* 

b 


f  g 
^  * 


^  ^  ^ 
p  m  1 


h 
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Instance  graph  (p  is  the  most  recently  added  node) 

Figure  14  Sample  sketch  and  instance  graphs 

As  we  trace  a  path  through  the  sketch  graph,  we  try  to  mimic  our  actions  non- 
de ter ministic ally  in  the  instance  graph.  If  we  follow  an  edge  in  the  sketch  graph,  we 
attempt  to  follow  that  edge  in  the  instance  graph  for  each  set  in  our  list.  For  each  suc¬ 
cess  we  add  a  new  key  to  some  set,  and  for  each  failure,  we  delete  a  set.  Suppose  that 
several  edges  may  be  traversed  in  the  instance  graph  for  a  given  edge  of  the  sketch 
graph.  We  then  split  the  current  set  and  add  a  new  node  for  each  copy.  The  closing  of 
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a  cycle  in  the  sketch  graph  corresponds  conceptually  to  a  select  on  the  set  list.  In  this 
way  we  ensure  that  links  actually  exist  in  the  instance  graph  for  the  two  relevent  forms 
represented  in  each  set. 

Figure  15  describes  the  steps  followed  in  locating  the  working  set  in  our  example. 
If  at  any  point  ail  working  sets  are  lost,  the  algorithm  halts  with  no  working  set  of  forms 
identified. 


potential 

working 

sets 

1  2  3  4  5 

? 

p  is  a  ne’w  form  matching  sketch 

5. 

f  P 

g  P 

From  node  5  in  the  sketch  graph 
we  can  reach  node  3  along  edge 
(3,5).  The  edges  ((3,f),(5,p))  and 
((3,g),(5,p))  in  the  instance  graph 
are  followed  and  the  potential 
working  set  is  "solit". 

c  f  p 

d  f  p 

(i  g  P 

The  edge  (2.3)  is  now  followed, 
splitting  the  first  set  of  the  previ¬ 
ous  step. 

a  c  f  p 

V» 

KJ.  L  ^ 

b  d  2  p 

Follow  edge  (1,2). 

a  c  f  p 

Edge  (2,5)  completes  a  cycle. 
Perform  a  select  on  the  sets 
resulting  from  the  last  step. 
Since  ((2,d),(5,p))  is  not  m  the 
instance  graph,  two  potential 
working  sets  are  lost. 

a  c  f  h  p 

All  the  edges  m  the  sketch  graph 
have  been  traversed.  A  form  that 
matches  sketch  4  must  be  added. 

a  c  f  h  p 

Check  that  form  f  dillers  from 
form  p. 

Figure  15  Finding  a  working  set  of  forms 
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The  sketch  and  instance  graphs  are  described  as  follows:  The  sketch  graph  is 
G’(N',E’)  where  N’  =  ^1,  ...  n]  is  the  set  of  colours  and  E*  is  a  subset  of  N’  x  N’  containing 
no  (i,  j)  where  i  =  j.  F  is  the  set  of  form  keys.  The  instance  graph  is  G(N.E)  where  N  is  a 
subset  of  N’  X  F  and  E  is  a  subset  of  N  x  N.  Furthermore,  we  adopt  the  convention  that 
if  X  =  (i,  k)  belongs  to  N,  then  x’  =  i  and  x”  =  k,  and  if  e  =  (x,  y)  belongs  to  E,  then  e’  = 
(k’,  y’). 

In  the  example, 

N’  =  U.2.3.4-.51. 

E‘  =  HI.2).  (^.3).  (3.5),  (l^.5)l. 

F  =  [a.b,c,d,f,g,h,].m,p  j, 

N  =  (l.b),  ...(5,p)^,  and 

E  =  K(l.a).(2.c)).  ((l,b),{2.d)),  ...((2,c).(5,p))i. 

¥e  note,  then,  that  for  each  x  in  N,  x’  must  belong  to  N’,  and  for  each  e  in  E,  e’ 
must  belong  to  E’  --  Le.  nodes  and  edges  in  the  instance  graph  correspond  to  nodes  and 
edges  of  the  sketch  graph. 

Suppose  that  finding  a  complete  set  of  forms  is  equivalent  to  locating  an  instance 
of  the  sketch  graph  within  the  instance  graph.  We  can  express  this  as  follows:  We  seek 
all  subsets  N”  of  N  such  that  (1)  [x’ix  in  N"j  =  N'  and  (H)  for  each  (i.  j)  in  E’,  there  exists 
X  and  y  in  N"  such  that  x’  =  i.  y’  =  j  and  (x,  y)  belongs  to  E  --  i.e.  for  each  node  and 
edge  of  the  sketch  graph  there  exist  unique  corresponding  nodes  and  edges  in  the 
spanning  graph  G’[N'']. 

In  the  example 

N”  =  (2,c),  (3,f),  (4,h),  (5.p)l 

The  algorithm  for  finding  all  such  subsets  N"  makes  use  of  the  knowledge  that  any 
working  set  of  forms  must  include  the  most  recently  added  node,  say  x.  Furthermore, 
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there  are  two  checklists,  node  and  edge,  with  slots  for  each  element  of  N‘  and  FI* 
respectively.  These  record  whether  or  not  the  edges  and  nodes  have  been  inspected. 
All  are  initially  set  to  false,  and  a  set  list.  D.  is  set  initially  to  empty.  Each  set  has  n 
slots  to  hold  all  the  keys  of  any  working  set  of  forms  found  by  the  algorithm: 


Let  X  in  N  represent  the  newly  added  form. 

Add  a  set  to  D,  with  slot  x’  set  to  x”:  x  must  belong  to  the  working  set. 

Set  node[x’]  to  true:  check  off  node  x’  of  the  sketch  graph, 
for  each  e  =  (i,  j)  in  E’  such  that  edge[e']  is  false  do 
if  both  node[i]  and  node[j]  are  false  then 
for  each  set  in  D  do 

for  each  (y.z)  in  E  where  y’  =  i  and  z’  =  j  do 
copy  the  set 

set  slot  i  to  y",  slot  j  to  z" 
delete  the  original  set 

else  if  exactly  one  of  node[i]  and  node[j]  is  false  then 
/*  without  loss  of  generality,  node[i]  ♦/ 
for  each  set  in  B  do 

for  each  (y.z)  in  E  where  y’  =  i  and  z’  =  j  and 
y"  is  already  in  slot  i  of  the  set  do 
copy  the  set 
set  slot  j  to  z" 
delete  the  original  set 
else  if  node[i]  and  node[j]  are  true  then 
for  each  set  in  D  where  (y.z)  is  not  in  E  and 
y"  =  i,  z"  =  j  do 
delete  the  set 
set  edge[e’]  to  true 
set  node[i]  to  true 
set  node[j]  to  true 

Check  that  forms  of  the  same  type  are  different. 


If  D  is  empty  v.'^hen  the  algorithm  is  finished,  then  no  working  sets  were  found.  If  D 
is  not  empty,  then  the  "first"  set  containing  no  duplicate  keys  is  chosen  as  the  working 
set. 


The  station's  owner  may  attempt  to  move  some  of  the  forms  in  the  working  set 
while  the  interpreter  is  running.  Each  of  the  forms  must  therefore  be  set  aside.  Each 
form  in  the  working  set  is  deleted  from  the  system  so  that  the  only  copy  is  the 
interpreter’s  image  of  the  form.  If  any  of  the  forms  cannot  be  found,  then  the  inter- 


-128- 


preter  restores  all  the  forms  retained  thus  far.  and  aborts  the  forms  procedure. 

If  all  the  forms  are  successfully  obtained,  then  the  interpreter  performs  the  set  of 
actions.  In  the  translation  phase,  the  legality  of  actions,  implied  actions  and  a  legal 
order  of  actions  have  already  been  determined. 

Actions  may  ’'fail"  if  a  siring  is  too  long  to  be  inserted  in  a  given  field,  or  if  a  form 
is  mailed  to  a  non-existent  station.  In  the  former  case,  TLA  chooses  to  insert  the  null 
string  by  default,  with  the  understanding  that  both  humans  and  procedures  are  intelli¬ 
gent  enough  to  interpret  this  not  as  a  value,  but  as  a  non-value.  In  the  latter  case,  OFS 
(and  consequently  TLA)  returns  the  mail  to  the  sending  workstation.  Since  TLA  pro¬ 
cedures  are  capable  of  recognizing  the  source  of  mail,  it  is  presumed  that  this  anomaly 
could  be  appropriately  dealt  with  if  a  user  felt  it  necessary. 

6.  Concluding  remarks 

TLA  captures,  in  some  sense,  what  is  meant  by  an  '’automatic  forms  procedure". 
The  context  of  OFS  limits  the  range  of  possible  actions  upon  forms.  There  are  also 
many  things  that  persons  can  do  with  OFS  w'^hich  have  not  been  modelled  in  TLA. 
Automatic  procedures,  for  example,  are  not  smart  enough  to  expect  the  timely  return 
of  a  form  which  has  been  shipped  away. 

Form  flow  is  determined  by  the  particular  configuration  of  procedures  across  the 
system.  Analytic  tools  are  needed  for  determining  some  notion  of  ’’correctness’’  [Tsi- 
chritzis  1981].  It  is  the  responsibility  of  the  users  and  a  form  administrator  to  model 
and  analyse  that  there  are  no  undesirable  side  effects  resulting  from  some  particular 
combination  of  automatic  procedures.  Such  analysis  should  be  performed  within  a  rea¬ 
sonable  complexity  bound  and  it  should  be  peiTormed  mechanically  if  at  all  possible. 
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The  complexity  of  interpreting  automatic  procedures  and  form-gathering  cleeirly 
depends  on  (l)  the  size  of  the  'working  set  for  a  procedure,  (2)  the  number  of  automatic 
procedures  rimning  at  workstations,  emd  (3)  the  number  of  form  images  "waiting”  in 
the  instance  graphs  of  a  works LaLion.  The  complexity  of  identifying  a  sketch  graph 
within  the  graph  gro-ws  if  the  sketch  graph  is  not  merely  a  subgraph  of  the  instance 
graph.  Ob-viously.  whatever  factors  contribute  to  this  complexity  must  be  considered 
in  any  "good  office  design".  How^ever,  exactly  what  constitutes  "good  design”,  and  to 
what  extent  it  is  feasible,  is  not  easily  established. 

Partly  completed  working  sets  of  forms  may  or  may  not  have  a  particular  meaning 
in  terms  of  exceptions  and  errors.  Tf  forms  are  "missing”  from  a  working  set,  the 
present  forms  may  also  be  part  of  another  working  set.  The  missing  forms  would 
determine  which  procedure  is  to  be  activated.  There  is  no  ■way  of  telling  which  pro¬ 
cedure  forms  are  missing  until  they  arrive.  Missing  forms  may  never  arrive.  There  is 
no  way  of  interpreting  their  absence  as  an  error,  except  by  placing  some  arbitrary 
time  limit  upon  form-gathering. 

Forms  may  satisfy  partly  completed  working  sets  for  a  number  of  procedures. 
There  is  a  need  for  some  convenient  way  of  displaying  these  sets.  Users  could  inter¬ 
pret  what  is  "missing”  and  possibly  act  on  this  information.  Instance  graphs  could  be 
quite  complicated.  Sevieral  partly  completed  sets  may  overlap  in  a  single  instance 
graph.  A  graphic  display  would  present  this  information  in  a  much  better  fashion  than 
lists  of  form  keys. 

A  simple  feature  that  would  increase  user  inLeraclion  with  automatic  procedures 
would  be  a  fimction  whose  value  is  determined  by  the  user.  When  the  interpreter  sees 
this  function  assigned  to  a  field  in  an  action  sketch,  it  holds  all  the  forms  in  the  work¬ 
ing  set.  It  then  notifies  the  user  when  he  next  signs  on,  and  waits  until  the  user  makes 
a  request  to  inspect  the  working  set.  At  that  point  the  user  is  allowed  to  assign  a  value 
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to  the  field  (or  possibly  abort  the  procedure),  and  then  execution  will  resume. 

Form  flow  between  stations  in  TLA  is  determined  by  the  interplay  of  automatic 
procedures.  Flow  of  execution  could  be  made  more  explicit  by  passing  control  between 
procediires  in  different  stations.  One  could  then  pass  working  sets  of  forms  between 
proceduies.  in  this  way  we  could  explicitly  determine  the  order  of  operations.  Pro¬ 
cedures  could  then  be  called  from  other  procedures  without  the  need  for  form- 
gathering.  Decision  points  could  be  modelled  by  branching  rather  than  by  a  variety  of 
similar  working  sets  of  forms.  Which  procedure  is  to  be  called  could  be  decided  by 
evaluating  a  function  whose  arguments  are  field  values  from  the  working  set. 

Many  office  automation  systems  have  been  strongly  influenced  by  the  SBA  [deJong 
1900]  and  OBE  [Zloof  1980]  systems  and  Offioetalk  [Ellis  8c  Nutt  1980].  The  most  notica- 
ble  exception  are  SCOOP  [Zisman  1979]  and  BDL  [Hammer  et  al.  1977],  which  are,  how¬ 
ever,  more  office  systems  programming  languages  than  office  worker's  languages.  TLA 
follows  this  trend.  It  uses  forms  that  are  manipulated  at  workstations,  like  Ofl^cetalk, 
and  the  non-procedural  interface  for  defining  procedures  was  in  large  part  inspired  by 
the  work  of  deJong  and  Zloof.  However,  TLA  takes  a  somewhat  different  approach  from 
either. 

A  major  goal  of  the  TLA  project  was  to  provide  a  facility  for  automating  office  pro¬ 
cedures  that  could  be  used  by  office  workers,  as  opposed  to  computer  professionals, 
with  a  minimum  of  training.  As  a  result,  there  was  an  emphasis  on  providing  fa.miliar 
concepts  and  a  highly  uniform  interface. 

The  form  is  a  very  familiar  concept  to  all  office  vrorkers.  Therefore,  the  idea  of  a 
sketch  is  an  easy  one  to  teach.  By  contrast,  the  SBA  notion  of  boxes  is  both  useful  and 
powerful.  However,  it  has  no  analog  in  the  office  of  today,  and  therefore  requires  a 
more  expert  office  worker  to  use. 
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In  QBE,  conditions  appear  in  a  separate  box  froxii  the  tables  of  an  application.  By 
contrast,  TLA  "conditions"  (constraints)  appear  within  a  form  itself.  This  difference  is 
not  quite  as  minor  as  it  seems;  it  reflects  an  underlying  philosophy  in  the  TLA  project 
that  the  user  interface  should  be  as  uniform  as  possible.  There  are  no  separate  condi¬ 
tion  boxes  attached  to  forms  within  the  underlying  manual  system,  and  therefore  there 
are  no  separate  conditions  attached  to  sketches.  Information  that  absolutely  cannot 
be  obtained  from  the  form  fields  (such  as  the  source  of  the  form)  is  specified  using 
pseudo-sketches  that  resemble  forms  as  closely  as  possible. 

Another  difference  between  TLA  and  the  IBM  systems  is  that  TLA,  like  its  ancestors 
OFS  and  MRS,  runs  on  very  small  computers.  Most  of  the  development  was  done  on  an 
LSI-11/23;  the  remainder  was  done  on  a  "big  machine",  a  PDP-ll/45.  This  means  that 
the  hardware  required  for  TLA  is  affordable  by  any  office  large  enough  to  benefit  from 
automation.  At  the  same  time,  incremental  growdh  can  be  easily  achieved  by  adding 
additional  machines  of  a  wide  range  of  sizes  to  a  local  net. 

Both  OFS  and  TLA  have  been  implemented  on  PDP-ll’s  and  LSI  11/23’s  running 
under  UNIX,  Compatibility  with  OFS  w^as  maintaiined  in  TLA.  Changes  to  code  and  the 
internal  representation  of  an  OFS  system  were  mostly  additions  of  modules  and  UNIX 
file  directories.  Where  existing  files  and  code  w^ere  modified,  compatibility  was  main¬ 
tained,  so  that  OFS  would  simply  ignore  the  added  TLA  features.  Conversion  costs  from 
an  OFS  system  to  one  that  supports  TLA  are  negligible,  and  any  TLA  system  could  be 
run  with  the  OFS  subset. 

7.  References 

Attardi,  G.,  Barber,  G.  and  Simi,  M..  "Towards  an  Integrated  Office  Work  Station”, 
MIT,  1980. 


-132- 


Cheung,  C.,  "OFS  --  A  Distributed  Oflice  Form  System  with  a  Micro  Relational  Sys¬ 
tem",  M.Sc.  thesis,  Department  of  Computer  Science,  University  of  Toronto, 
1979. 


Cheung,  C.  and  Kornatowski,  J.,  The  OFS  User’s  Manual,  Computer  Systems 
Research  Croup,  University  of  Toronto,  1980. 


de  Jong,  P.,  "The  System  for  Business  Automation  (SBA):  A  Unified  Application 
Development  System",  Information  Processing  80,  Lavington,  S.H.  (ed.),  North- 
Holland,  The  Hague,  1980. 


Ellis,  C.A.  and  Nutt,  G.J.,  "Computer  Science  and  Office  Information  Systems", 
Computing  Surveys.  March  1980. 


Gibbs,  S.,  "OFS:  An  Office  Form  System  for  a  Network  Architecture",  M.Sc.  thesis, 
Department  of  Computer  Science,  University  of  Toronto,  1979. 


Gibbs,  S.,  The  OFS  Prograrrvmer' s  Manual,  Computer  Systems  Research  Group, 
University  of  Toronto,  1980. 

Hammer,  M..  Howe,  W.G.,  Kruskal,  V.J.  and  Wladawsky,  L,  "A  Very  High  Level  Pro¬ 
gramming  Language  for  Data  Processing  Applications".  Comm  ACM  20,  11  (1977), 
pp.  B32-540. 


Hammer,  M.  and  Kunin,  X.S..  "Design  Principles  of  an  Office  Specification 
Language",  MIT  paper,  1979. 


Hogg,  J.,  "TT.A:  A  System  for  Automating  Form  Procedures",  M.Sc.  thesis.  Depart¬ 
ment  of  Computer  Science,  University  of  Toronto,  1931. 


Hudyma,  R.,  "Architecture  of  Microcomputer  Distributed  Database  Systems", 
M.Sc.  thesis.  Department  of  Computer  Science,  University  of  Toronto,  1978. 


Hudyma,  R.,  "The  Hardware  Design  of  Distributed  Office  Workstations"  in  A 
Panache  of  DBMS  Ideas  III,  Technical  Report  111,  Computer  Systems  Research 
Group,  Universit}'  of  Toronto,  1900. 


Kermghan,  B.W.  and  Ritchie,  D.M.,  The  C  Programming  Language,  Prentice-Hall, 
Englewood  CiifTs,  New  Jersey,  USA,  1978. 


-133- 


Komato'vvski,  J.Z.,  The  MRS  User's  Manual,  Computer  S3'stems  Research  Group. 
University  of  Toronto.  1979. 


Ladd,  L,  "A  Distributed  Database  Management  System  Based  on  Microcomput¬ 
ers”,  M.Sc.  thesis.  Department  of  Computer  Science,  University  of  Toronto,  1979. 

Ladd,  I.  and  Tsichritzis,  D.,  "An  Office  Form  Flow  Model”  in  1980  NCC  proceedings. 


Metcalfe,  R.M.  and  Boggs,  D.K.,  "Ethernet:  Distributed  Packet  Switching  for  Local 
Computer  Networks”.  Comm.  ACM  19.  7  (1976),  pp.  364-404. 


Morgan.  H.L..  "Research  and  Practice  in  Office  Automation”,  Department  of  Deci¬ 
sion  Sciences.  The  Wharton  School,  University  of  Pennsylvania.  Philadelphia.  PA, 
USA.  1980. 


Nierstrasz,  O.M.,  "Automatic  Coordination  and  Processing  of  Electronic  Forms  in 
TLA",  M.Sc,  thesis.  Department  of  Computer  Science,  University  of  Toronto, 
1981. 


Peterson,  J.L.,  "Petri  Nets",  ACM  Computing  Survey's  9,  3  (1977),  pp.  223-252. 


Ritchie,  D.M.  and  Thompson,  K..  "The  UNIX  Time-Sharing  System",  The  hell  Sys¬ 
tem  Technical  Journal,  Vol.  57,  #G  (July-August  1973),  pp.  1905-1929. 


Zisman,  M.D.,  "Representation,  Specification  and  Automation  of  Office  Pro¬ 
cedures",  PhD  dissertation.  Wharton  School,  University  of  Pennsylvania,  1977. 


Zloof,  M.M.,  "Query  by  Example",  AFIPS  Conference  Proceedings,  Vol.  44, 
NCC. 


1975 


Zloof,  M.M.,  "A  Language  for  Office  and  Business  Automation".  IBM  Research 
Report,  IBM  Thomas  J.  Watson  Research  Centre,  Yorktown  Heights,  New  York, 
USA,  1980. 


OUTPUT  GENERATION  IN  OFFICE  INFORMATION  SYSTEMS 


Simon  Gibbs 

Computer  Systems  Research  Group 
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ABSTRACT 

This  paper  discusses  a  model  for  output  generation  within  an  Office  Information 
System.  This  model  contains  an  internal  conceptual  representation  which  may 
be  transformed  into  various  external  representations.  The  transformatioris  and 
representations  used  in  the  generation  of  English  utterances  are  described  in 
some  detail.  The  use  of  the  model  by  form  oriented  Office  Information  Systems 
is  also  discussed. 
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1  Introduction 

One  of  the  problems  facing  t.he  designer  of  an  Office  Information  Sys¬ 
tem  (OIS)  is  the  need  for  both  extensibility  and  simplicity  (see,  for  example,  Ellis 
and  Nutt  in  [ELL80]).  The  first  requirement  arises  from  the  dynamic  nature  of 
the  office  environment;  the  functions  and  size  of  the  office  may  contract  or 
expand,  regulations  which  govern  its  operation  may  change  as  may  the  organiza¬ 
tional  structure  of  the  office  itself.  In  edl  such  cases  the  OiS  must  adapt  by 
allowing  its  data  structures,  procedures  and  physical  configuration  to  be 
modified  to  reflect  these  changes.  The  second  requirement,  OIS  simplicity, 
refers  not  to  the  system  itself,  but  to  the  interface  presented  to  the  user.  The 
internal  complexity  of  the  OIS  must  be  screened  from  the  user  by  a  simple, 
"friendly",  user  interface. 

This  paper  presents  a  model,  for  the  output  generation  component  of 
the  OIS,  which  attempts  to  satisfy  both  the  extensibility  and  simplicity  con¬ 
straints.  As  a  concrete  example  of  use  of  the  model,  the  generation  of  English 
utterances  is  described.  Finally  an  application  of  the  model  to  OiS  forms  is 
briefly  discussed. 


2  Output  Generation  Model 

An  external  representation  is  defined  as  a  representation  of  informa¬ 
tion  which,  when  physically  realized,  is  ^comprehensible  to  a  human  user.  The 
output  generator  is  that  part  of  the  OIS  which  is  concerned  with  transforming 
information  from  an  internal  machine  representation  to  an  external  representa¬ 
tion.  Note  that  the  output  generator  does  not  determine  which  information  is  to 
undergo  this  transformation,  this  is  the  function  of  the  analysis  components  of 
the  OIS. 


The  manner  in  which  information  is  represented  to  an  external  user 
will  vary  with  the  circumstances.  For  example,  if  the  user  is  siLLing  in  front  of  a 
video  terminal  then  an  alphanumeric  text  or  table  representation  could  be  used. 
If  however  the  user  is  not  near  a  display  terminal  or  is  likely  to  be  gazing  in 
some  other  direction  then  an  audio  representation  may  be  more  appropriate. 
This  choice  of  representation  also  depends  on  the  nature  of  the  information,  i.e., 
certain  information  may  be  more  clearly  expressed  by  using  graphs  or  tables, 
while  for  other  information  a  pictorial  or  textual  representation  is  the  most 
natural. 


In  general  the  transforriiallons  needed  to  lealize  each  exlei’iial 
representation  will  differ.  However  this  need  not  be  the  case  for  their  underly¬ 
ing  internal  machine  representation,  it  should  be  possible  to  have  a  single  inter¬ 
nal  representation  from  which  the  various  external  representations  are  derived 
by  succesive  transformation.  Such  an  internal  representation  is,  of  course,  dev¬ 
ice  independant,  but  more  importantly  it  is  also  independant  of  the  external 
representation  used  by  the  device.  This  implies  that  our  internal  representation 
must,  in  some  manner,  capture  the  meaning  or  sernaiiLlcs  of  the  inlui mation. 
We  call  such  a  representation  a  conceptual  representation. 


-136- 


The  sequence  of  transformations  necessary  to  map  from  a  concep¬ 
tual  representation  to  an  external  representation  may  be  long  and  complex.  In 
order  to  simplify  the  analysis  of  this  process  we  may  hypothesize  a  number  of 
intermediate  representations.  Quite  often  these  representations  are  tree-like 
and  of  a  structural  nature,  for  this  reason  we  will  call  such  an  intermediate 
representation  a  siTULctxLra.1  TepTeserUatiOTi. 

Combining  the  above  components  gives  the  output  generator  struc¬ 
ture  of  Figure  1.  In  this  figure  the  transformations  have  been  divided  into  three 
steps.  The  first  step,  which  consists  of  a  single  transformation,  takes  a  concep¬ 
tual  item  [1]  and  transforms  it  into  a  structural  item.  During  the  second  step  a 
series  of  structural  transformations  are  applied,  the  result  of  these  transforma¬ 
tions  is  still  a  structural  item.  Finally  the  last  step,  also  consisting  of  a  single 
transformation,  produces  an  external  item.  In  the  following  sections  the  various 
components  of  the  output  generator  will  be  discussed  in  more  detail. 


H.i  Conceptual  Representation 

Various  models  for  representation  of  meaning  have  been  developed 
by  computer  scientists  and  linguists  interested  in  the  problem  of  man-machine 
communication  using  natuial  language.  These  models,  such  as  semantic  nel- 
wrorks  [SIM73]  and  conceptual  dependancies  [SCHTS],  have  a  strong  linguistic 
flavour  since  they  are  essentially  concerned  with  representing  the  meaning  of 
natural  language  sentences. 

Schank  however,  does  consider  the  characteristics  required  by  more 
general  conceptual  representations.  These  are  unambiguity,  uniqueness,  and 
psychological  and  oompulatlonal  validity. 

A  conceptual  item  is  unambiguous  when  only  one  meaning  may  be 
assigned  to  it.  It  is  unique  if  there  is  no  other  item  with  the  same  meaning.  That 
the  representation  of  meaning  be  unambiguous  is  necessary  so  that  the  external 
user  will  assign  the  same  meaning  to  a  conceptual  item  as  the  machine,  unique¬ 
ness  is  necessary  in  order  for  the  representation  to  be  consistent. 

Psychological  validity  stipulates  that  the  representation  be  similar  to 
that  used  by  humans.  While  this  requirement  is  necessai-y  if  psychological 
processes  are  to  be  modelled  it  is  not  at  all  necessary  in  office  systems  and  so 
will  be  discarded. 

Computational  validity  is  perhaps  the  most  important  requirement,  it 
means  that  a  computer  should  be  able  to  efficiently  operate  writh  the  represen¬ 
tation.  In  practice  this  implies  that  the  representation  must  consist  of  a  small 
number  of  primitive  abstract  units,  the  semantics  of  which  is  assumed  to  be 

1  ■»»»  V>Tr  4'V»r% 


[1] 


A  coiicepLual  item  simply  refers 
represented,  .similarly  for  slr-uctural 


to  information  that  is  conceptually 
and  external  items. 
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The  primitive  units  of  a  meaning  representation  are  such  things  as 
objects,  properties,  and  relationships.  The  various  models  difler  ia  how  these 
are  organized,  although  in  general  a  network-like  structure  results. 

2.2  Structural  Representations 

A  structural  item  is  a  hierexchical  tree-like  representation  of  the 
information  contained  in  a  conceptual  item.  Associated  with  each  structural 
representation  is  a  set  of  production  rules  (P-rules)  which  define  the  allowable 
relationships  between  constituents  of  the  hierarchy.  The  P-rules  may  also  be 
thought  of  as  generating  all  structural  items  within  the  representation.  Indeed 
it  is  possible  to  uniquely  specify  a  structural  item  by  the  sequence  of  P-rules 
which  generate  it. 

Since  structural  representations  are  used  only  during  the  intermedi¬ 
ate  stages  of  output  generation  they  are  invisible  to  the  external  user.  Thus  we 
are  free  to  choose  whichever  representation  is  the  most  convenient,  i.e.,  assists 
us  the  most  in  performing  the  transformation  from  conceptual  to  external. 
Whether  or  not  the  structural  representations  we  choose  correspond  to  how  a 
human  would  desciibe  the  structure  of  a  concept  is  of  no  consequence. 


2.3  External  Representations 

Information  is  presented  to  the  user  by  physically  realizing  a  linear 
sequence  of  signals  on  some  output  device.  The  user  may  perceive  this  informa¬ 
tion  in  a  number  of  ways.  Although,  often,  this  perception  is  also  of  a  linear 
sequence  (of  spatially  ordered  symbols  for  text  or  temporally  ordered  sounds 
for  voice),  this  is  not  always  the  case;  graphic  images,  for  example,  may  be  per¬ 
ceived  as  a  multidimensional  ordering  of  symbols.  However  in  all  cases  there  is 
a  linear  sequence  of  signals  which  gives  rise  to  this  perception. 

An  external  item  then  is  simply  a  string  of  symbols.  The  function  of 
the  output  generator  is  to  linearize  a  conceptual  item  by  transforming  it  to  such 
a  string. 


2.4  Conceptwal  to  Structural  Transformation 

The  initial  transformation  to  a  structural  representation  involves 
determining  the  sequence  of  P-rules  which  generate  a  structural  description  of  a 
conceptual  item.  One  approach  is  to  apply  P-rules  in  a  top-down  manner.  After 
each  P-rulc  application  a  comparison  is  made  between  the  partially  generated 
structural  item  and  the  original  conceptual  item,  if  the  tw'o  do  not  correspond 
the  efTects  of  the  P-rulc  are  undone  and  a  different  P-rule  selected  for  applica¬ 
tion  (such  backtracking  may  be  more  than  one  level  deep).  This  method  is 
essentially  a  random  generation  of  structural  items  which  terminates  only  when 
the  correct  item  has  been  generated.  Obviously,  only  for  extremely  simple 
cases  will  such  a  method  be  of  use. 
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What  is  needed  is  a  method  which  chooses  the  correct  P-rules  based 
upon  the  input  conceptual  item;  the  transformation  is  then  "semantically 
driven",  and  requires  a  specification  of  the  mapping  between  the  primitive  units 
of  the  conceptual  representation  and  the  constituents  of  the  structural 
representation. 

Thus  two  components  are  needed  to  implement  the  transformation 
efficiently,  a  mapping  which  relates  the  two  representations  and  a  generator 
(not  to  be  confused  with  the  output  generator)  which  selects  and  applies  P-rules. 


2.5  Structural  Transformations 

The  remarks  made  in  the  previous  section  also  hold  for  structural 
transformations.  That  is  that  the  transformation  contains  two  components,  one 
a  mapping  (now  between  two  structural  representations)  and  the  other  a  gen¬ 
erator. 


2.6  Output  Transformation 

The  final  component  of  the  output  generator  is  concerned  with 
transforming  from  a  structural  representation  to  an  external  representation.  A 
structural  item,  as  viewed  by  an  output  transformation,  is  a  sequence  of  compo- 
sition  rules  (C-rules).  The  transformation  can  be  performed  by  traversing  (in 
preorder)  the  item  and  applying  C-rules  as  they  are  encountered.  In  this 
manner  a  linear  sequence  of  symbols  is  obtained  which  can  be  sent  directly  to 
the  device.  C-rules  are  device  dependant,  the  code  that  implements  these  rules 
will  vEiry  from  device  to  device. 

As  y.dth  the  ether  tran.sformations  we  can  identify  two  components, 
the  first  is  a  mapping  between  P-rnles  and  C-rules  and  the  second  is  a  generator 
which  now  simply  traverses  structural  items  and  applies  C-ruies. 


In  the  introduction  of  this  paper  it  was  claimed  that  our  model  for 
output  generation  would  improve  the  extensibility  of  the  OIS  and  lead  to  a 
simpler  interface  for  the  user.  These  claims  will  now  be,  at  least  partially, 
justified. 


It  has  been  pointed  out  that  the  extensibility  of  a  system  is  strongly 
related  to  the  modularity  of  knowledge  within  the  system  [SUS75],  in  particular 
if  knowledge  is  "hardwired"  in  then  the  system  may  be  very  efficient  but  will  also 
suffer  from  restricted  applicability.  In  the  case  of  the  output  generator  the  only 
knowledge  IhaL  is  "haidwii’ed"  is  IhaL  pari  of  the  generator  which  operates  on 
the  conceptual  representation.  Other  representations  may  be  added  by  specifj-- 
ing  the  transformations  to  these  representations. 

To  clarify  this  let  us  consider  three  types  of  extension  to  an  OIS. 
First  suppose  that  a  new  device  is  to  be  added  to  the  system,  also  assume  that 
the  external  representation  used  by  this  device  is  already  known  to  the  system. 
(As  an  example,  consider  adding  a  raster  scan  display  to  a  system  which  already 
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uses  vector  displays.)  In  this  case  the  intermediate  structural  representations, 
the  output  transformation,  and  so  on  are  already  present.  All  that  remains  is  to 
write  a  version  of  the  C-rules  for  the  new  device. 

A  second  type  of  extension  occnrs  when  we  wish  to  add  an  external 
representation.  For  such  an  extension  it  is  necessary  to  specify  all  the  transfor¬ 
mations  between  the  conceptual  representation  and  the  new  external  represen¬ 
tation.  This  will  be  more  difficult  than  the  previous  case,  however  any  changes 
that  occur  will  be  isolated  to  the  output  generator  and  have  no  affect  on  the 
other  OIS  components. 

The  previous  extensions  have  been  of  a  physical  nature,  it  is  also  pos¬ 
sible  that  the  logical  structure  of  the  OIS  wnll  be  altered.  In  this  case  new  infor¬ 
mation  will  be  added  at  the  conceptual  level,  for  example  a  new  data  structure 
and  its  ’’meaning"  could  be  added.  This  type  of  extension  is  then  completely 
independant  of  the  output  generator  (assuming  no  new  external  representations 
are  desired). 

Simplicity  of  the  Tiser  interface  is  not  directly  derivable  from  our 
model,  rather  the  model  provides  the  capability  for  designing  such  an  interface 
since  there  is  a  great  freedom  in  choosing  external  representations. 


3  Generation  of  Engiisii  Utterances 

In  this  section  our  model  will  be  applied  to  those  complex  transfor¬ 
mations  which  generate  speech  output.  At  present  there  are  many  implementa¬ 
tions  of  the  various  steps  in  the  transformation  process  [GnT,7h,  STM72,  WON7h, 
WIT791,  and  at  least  one  system,  SSC  [Y0U79],  has  tied  all  the  steps  together. 

3.1  Representations 

SSC  (Speech  Synthesis  from  Concept)  was  ’’designed  specifically  for 
providing  speech  output  from  information  systems"  and  has  been  tested  using  a 
real  database.  In  SSC  there  are  three  intermediate  structural  representations: 
a  deep  structure  P  marker,  a  surface  structure  P  marker,  and  a  phonological 
marker.  This  division  will  be  adherred  to  in  the  follo^ving  although  the  represen¬ 
tations.  transformations,  and  terminology  used  here  differ  somewhat  from  SSC. 


3.1.1  Conceptual  Dependency 

Conceptual  Dependency  (CD)  is  a  conceptual  representation 
developed  by  Schank  and  his  co-workers  [SCH75].  The  primitive  units  are 
objects,  attributes,  actions,  and  various  relationships  that  m.ay  occur  between 
these.  In  CD.  as  with  ail  conceptual  representations,  there  is  no  reference  to  a 
given  language,  i.e.,  the  representation  is  "language-free"  (as  one  would  intui¬ 
tively  expect  of  a  meaning  representation).  One  of  the  achievements  of  CD  is 
that  the  number  of  actions  and  relationships  is  very  small  (about  14  different 
actions  and  16  different  relationships)  yet  the  expressive  range  is  quite  large. 
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The  basic  CD  relationship  is  an  EVENT  and  consists  of  an  object 
(known  as  an  ACTOR)  related  to  an  action  (an  ACT).  CD  uses  a  graphic  notation 
for  ''conceptualizations"  (i.e.,  conceptual  items),  an  EVENT  is  represented  as: 
ACTOR<==>ACT,  this  is  intended  to  mean  that  ACTOR  performs  ACT.  As  men¬ 
tioned  above  CD  uses  a  small,  fixed  number  of  ACTs,  complex  verbs  must  be  bro¬ 
ken  down  to  these  primitive  actions.  Examples  of  ACTs  ai'e:  PTRANS  -  the 
transfer  of  the  physical  location  of  an  object,  ATR-ANS  -  the  transfer  of  an 
abstract  property  (eg.  ownership)  of  an  object,  and  MBUILD  -  to  create  or  com¬ 
bine  thoughts. 


In  CD  a  conceptualization  may  be  further  characterized  by  associat¬ 
ing  a  time  and  location  to  it,  also  other  objects  may  participate  in  the  conceptu¬ 
alization  and  their  roles  can  be  specified.  Finally  it  is  also  possible  to  specify 
certain  relationships  (such  as  causality)  between  conceptualizations. 


One  unusual  feature  in  CD  is  the  treatment  of  an  object’s  attributes, 
known  as  "states"  in  CD  terminology.  A  state  may  be  some  absolute  quantity, 
such  as  size  or  mass,  a  relation  such  as  ownership  or  containment,  or  a  scale 
such  as  health  or  anger.  In  the  case  of  scales  a  rather  arbitrary  numerical  value 
is  assigned  to  the  various  adjectives  along  the  scale,  for  example: 


HEALTH  . 

dead 

gravely  ill 

sick 

etc. 


from  -10  to  10 

-10 

-9 

-9  to  -1 


Using  such  a  scale,  the  conceptualization  for  "John  killed  Mary"  literally  would 
be  ’’John  performed  some  action  which  caused  Mary's  health  to  change  to  -10." 


The  syntax  for  conceptualizations  has  only  been  described  briefly 
here.  For  other  than  very  simple  sentences  the  conceptualizations  can  appear* 
quite  complicated.  An  example  is  shown  in  Figure  2. a,  in  English  this  conceptu¬ 
alization  could  be  represented  by  "John  went  to  New  York  by  train." 


3.1.2  Deep  Structure  P  Marker 

There  is  some  diseigreement  in  the  literature  over  the  name  to  be 
used  for  the  representation  we  now  encounter.  Simmons  originally  developed 
the  representation  and  refers  to  it  as  a  "semantic  network”  [SIM72],  Goldman  in 
later  work  uses  the  term  "syntax  net"  [G0L75].  Since  this  representation  is 
structural  (based  upon  case  grammar)  and  not  really  a  network,  we  will  call  it  a 
"deep  structure  ?  marker"  (this  is  also  the  terminology  used  in  SSC). 

Recall  from  section  2.2  that  a  structural  representation  is  specified 
by  a  set  of  P-rules.  The  P-rules  for  this  representation  are: 

S  — >  node* 

node  >  node  -!  relation_set 
;  terminaL_element 
relation_set  — >  (relation  -f  node)* 
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Here  S  is  the  start  symbol  and  the  asterisk  indicates  one  or  more  repetitions.  A 
relation  may  be  an  intersentential  connective  ("since",  "because",  etc.),  a  lexical 
marker  (tense,  aspect,  etc.),  or  a  deep  case  relation.  A  terminal  element  is  usu¬ 
ally  a  word  reference  although  it  may  also  be  the  value  of  a  lexical  marker  (past, 
present,  progressive  etc.). 

Case  grammar  was  first  proposed  by  Fillmore  [FIL68]  and  has  gained 
great  popularity  within  the  artificial  intelligence  community  [BRU75].  The  pur¬ 
pose  of  case  grammar  is  to  indicate  the  semantic  relationships  between  the  con¬ 
stituents  of  a  sentence,  in  particular  between  verbs  and  nouns  or  noun  phrases. 

A  deep  case  may  be  defined  as  a  property  whose  value  is  usually 
specified  for  a  given  type  of  event  [BRU75].  Originally  Fillmore  proposed  six 
cases  but  suggested  that  more  would  likely  be  ncccssarj'.  The  number  used  by 
actual  systems  varies.  However  the  following  are  often  present; 

Agentive  -  instigator  of  the  action. 

Instrumental  -  stimulus  or  immediate  physical  cause  of  the 
action, 

Dative  -  entity  affected  by  the  action. 

For  example,  in  "John  broke  the  window  with  a  rock.",  we  have  "John"  as  the 
agentive  case,  "the  window"  as  the  dative,  and  "a  rock"  as  the  instrumental.  In 
case  grammar  a  verb  is  usually  represented  by  a  "case  frame",  which  is  a  list  of 
cases  the  verb  can  take,  some  of  which  may  be  optional.  In  the  previous  exam¬ 
ple  the  verb  is  "to  break"  and  its  case  frame  would  include  the  three  cases  from 
above,  however  the  only  case  that  is  mandatory  is  the  dative  as  cam  be  seen  in 
"The  window  was  broken." 

Figure  2.b  shows  the  deep  structure  P  marker  corresponding  to  the 
conceptualization  of  F’igure  2. a.  As  can  be  seen  this  representation  is  in  fact  a 
tree  with  labelled  branches.  Often  the  representation  is  written  in  tabular  form; 


Nl; 

LEX 

AGT 

N2 

DIR 

N3 

INST 

N5 

TENSE 

past 

N2r 

LEX 

John 

N3; 

PREP 

to 

POBJ 

N4 

N4; 

LEX 

New  York 

N5; 

PREP 

by 

POBJ 

N6 

N6; 

LEX 

train 

Here  LEX  gives  the  lexicad  value  of  a  node  and  TENSE  the  tense  of  the  sentence. 
AGT,  DIR,  and  INST  are  deep  cases,  the  agentive,  directive,  and  instrumented 
respectively.  Deep  cases  usually  appear  on  the  surface  as  prepositional  phrases, 
PREP  gives  the  preposition  and  POBJ  the  pr  epositional  object.  Nl,  N2  and  so  on 
are  simply  the  names  of  the  nodes  and  have  no  bearing  on  the  surface  structure. 
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3.1.3  Surface  Structure  P  Marker 

This  representation  corresponds  very  closely  to  the  traditioned 
theory  of  grammatical  structure.  A  sentence  is  broken  into  noun  and  verb 
phrases  which  contain  the  subject  and  object  respectively.  The  following  is  a 
simple  example  (which  closely  resembles  a  grammar  used  by  Woods,  [W0070])  of 
the  P-ruIes  fur'  a  surface  str  ucture  grammar: 

3  .„>  Np  ^  (AUX)  +  VP 
i  AUX  +  NP  +  VP 

NP  — >  (DET)  +  (ADJ*)  +  N  +  (PP*) 

PP  — >  PREP  +  NP 
VP  -->  V  +  (NP)  +  (PP*) 

Here  the  brackets  indicate  that  the  enclosed  item  is  optional.  This  grammar 
uses  the  conventional  syntactic  categories: 


s 

-  a  sentence  (the  start  symbol), 

NP 

-  noun  phrase, 

VP 

-  verb  phrase, 

PP 

-  prepositional  phrase. 

PREP 

-  preposition. 

AUX 

-  auxiliary  verb. 

DET 

-  determiner, 

ADJ 

-  adjective, 

N 

-  noun. 

V 

-  verb. 

Figure  3. a  is  an  example  of  a  surface  structure  P  marker,  this  is  the  representa¬ 
tion  for  the  conceptualization  of  Figure  2.a. 


3.1.4  Phonological  Marker 

If  the  external  representation  to  be  used  -were  text  then  the  previous 
representation  would  sufRce,  for  by  applying  a  simple  output  transformation  to 
the  surface  structure  P  marker  one  can  print  the  desired  sentence.  However  if 
speech  is  to  be  used  as  the  external  representation  then  another  intermediate 
step  is  needed. 

Before  introducing  this  new  representation  it  is  useful  to  briefly 
review  voice  synthesis.  There  are  three  principal  methods  for  performing  voice 
synthesis  [FLA70],  namely  by  digitized  waveform,  formant  sjmthesis  (or  word 
concatenation),  and  phoneme  synthesis  (or  linear  predicative  coding).  The  first 
method  requires  the  storage  of  a  digital  representation  of  the  speech  waveform 
which  is  then  converted  to  an  analog  signal  when  desired.  Formant  synthesis 
uses  a  set  of  word  parameters  which  are  slowly  varying  relative  to  the  duration 
of  the  word.  During  synthesis  the  parameters  are  retrieved  on  a  word  by  word 
basis  and  the  audio  signal  for  the  word  recreated.  The  third  method,  phoneme 
synthesis,  is  similar  to  formant  synthesis  except  that  now  the  parameters  are 
stored  per  phoneme.  During  synthesis  the  phonetic  transcription  of  a  word  is 
first  retrieved  and  then  the  parameters  for  these  phonemes. 
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The  three  methods  difler  drastically  in  the  amount  of  storage  used, 
for  example  one  million  bits  will  provide  20  seconds  of  speech  using  digitized 
waveforms,  17  minutes  using  formeint  synthesis,  and  4  hours  if  phoneme  syn¬ 
thesis  is  used  [FLA70].  Clearly  phoneme  synthesis  is  the  most  space  efficient, 
this  is  an  important  consideration,  particular!)'’  if  a  large  vocabulary  is  expected. 
Unfortunately  phoneme  synthesis  is  also  the  most  diffiicult  of  the  three  methods, 
yet  it  is  not  impossible  and  there  are  eommerciahy  available  chips  which  con¬ 
tain  the  synthesis  algorithm  [F0N81].  In  the  remainder  of  this  paper,  -when 
voice  synthesis  is  discussed  it  will  be  assumed  that  this  method  is  being  used. 

Phonologists  use  two  sets  of  features  when  they  describe  the  sounds 
of  speech.  The  first  set  of  features  are  known  as  segmentals  since  they  apply  to 
single  phonemes  (the  "segments”  of  speech).  The  second  set,  suprasegmentals 
(or  prosodies),  involve  more  than  single  phonemes,  these  include  intonation, 
rhythm,  and  stress. 

For  synthesis  of  speech  of  good  quality  it  is  necessary  to  consider 
suprasegmental  features,  the  phonological  marker  is  a  representation  which 
accounts  for  these  features.  There  are  a  number  of  "rule-based"  synthesis  sys¬ 
tems  which  derive  the  suprasegmentals  from  syntactic  information  such  as  a 
surface  structure  P  marker  [YOU80].  The  phonological  marker  described  here 
is  used  in  Witten’s  system  [WIT79], 

The  set  of  P-rules  which  define  this  representation  are: 

S  — ■•>  tone_group* 

tone _ group  — >  foot* 

foot  — >  syllable* 
syllable  — >  phoneme* 

The  tone  group  [HAL67]  is  the  basic  unit  of  intonation,  that  is  each  tone  group 
will  ha\'’e  its  own  intonation  pattern.  English  uses  a  small  number  of  intonation 
patterns  (rising,  falling,  etc.),  the  P-ruies  do  not  indicate  which  pattern  is  asso¬ 
ciated  with  a  given  tone  group  but  such  information  -would  be  contained  in  the 
phonological  marker.  The  foot  [ABE65]  is  the  basic  unit  of  rhythm,  it  typically 
consists  of  t-wo  syllables  or  a  syllable  and  a  pause.  The  syllable  is  the  unit  of 
stress,  there  is  also  a  special  syllable  within  a  tone  group  if  a  major  change  of 
pitch  occurs  (the  "tonic  syllable”).  The  phonological  marker  would  indicate  both 
the  stress  of  a  syllable  and  -whether  it  is  a  tonic  syllable. 

Figure  3.b  shows  one  possible  phonological  marker  for  the  sentence 
from  Figure  3. a.  For  clarity  the  marker  is  not  shown  -with  separate  nodes  for 
each  phoneme.  The  phonetic  transcription  is  based  upon  the  International 
Phonetic  Alphabet  (TPA)  except  that  standard  characters  are  used  for  the 
phonetic  symbols  [F0N81]. 
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3.2  Transformations 

The  transformations  between  representations  will  now  be  described. 
The  procedures  used  in  performing  these  transformations  vary  among  imple¬ 
mentations  and  so  will  not  be  described  in  as  much  detail  as  the  representaions. 


3.2.1  Conceptual  Depiendency  to  Deep  Structure  P  Marker 

The  BABEL  progrgim  of  Goldman  [G0L75]  generates  English  sentences 
from  conceptuedizations.  The  first  step  in  this  generation  is  the  transformation 
to  a  deep  structure  P  marker. 

As  mentioned  in  section  2,4,  two  components  are  needed  to  specify 
such  a  transformation  -  a  mapping  and  a  generator.  In  BABEL  the  mapping 
between  representations  is  derived  from  two  data  structures,  discrimination 
nets  (D-nets)  and  the  CONCEXICON  (CONCEptual  leXICON). 

A  D-net  is  a  decision  tree  with  pointers  to  CONCEXICON  entries  at  the 
leaves.  Each  non-leaf  node  is  a  true/false  condition  which  may  be  matched 
against  a  conceptualization  or  a  memory  model.  By  applying  a  D-net  to  a  con¬ 
ceptualization  a  CONCEXICON  entry  is  selected  for  the  ACT  contained  in  the  con¬ 
ceptualization  (assuming  we  have  an  EVENT).  A  CONCEXICON  entry  has  two  com¬ 
ponents,  a  pointer  to  a  lexicon  entry  and  case  frame  information. 

The  transformation  generator  takes  as  input  a  conceptualization  and 
proceeds  to  apply  D-nets.  ’Wbien  a  CONCEXICON  entry  has  been  selected  the  gen¬ 
erator  "fills  in"  the  case  frame  using  information  from  the  conceptualization.  It 
is  at  this  point  that  the  nodes  of  the  deep  structure  P  marker  are  generated. 


3.2.2  Deep  Structure  P  Marker  to  Surface  Structure  P  Marker 

The  transformation  between  these  two  representations  is  perhaps  the 
most  easy  to  express  in  terms  or  our  model.  Simmons  and  Slocum  [SIM72]  first 
developed  and  implemented  this  transformation.  Their  generator  is  a  recursive 
transition  network  corresponding  to  the  P-rules  of  the  surface  structure  P 
marker.  This  network  is  then  augmented  by  adding  condition-action  pairs  to  the 
arcs  (such  a  nelAvork  is  known  as  an  ATN  or  augmented  transition  network 
[W0070]).  The  mapping  between  representation.^  is  implicit  in  these  pairs  of 
conditions  and  actions.  In  general  there  is  an  action  for  each  semantic  relation 
used  in  the  deep  structure  P  marker  (ACT,  DIR.  etc.)  and  the  transitions 
correspond  to  the  names  of  these  relations.  A  transition  is  allowed  if  the  deep 
structure  P  marker  contains  the  relation  specified  by  the  arc  condition.  When 
the  transition  occurs  the  action  for  that  relation  is  performed,  it  is  these  actions 
which  build  the  surface  structure  P  marker. 
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3.2.3  Surface  Structure  P  Marker  to  Phonological  Marker 

The  implementaLions  of  this  transformation  are  of  a  somewhat  ad  hoc 
nature  and  difficult  to  formulate  in  terms  of  our  model.  A  further  diffiouity  is 
that  it  is  an  open  question  whether  the  surface  structure  P  marker  contains 
enough  information  to  generate  the  phonological  marker.  For  example,  con¬ 
sider  "John  went  to  New  York."  and  "John  went  to  New  York?",  these  sentences 
would  have  the  same  surface  structure  P  mar  ker  yet  have  different  phonological 
markers.  One  solution  that  is  used  is  to  pass  down  semantic  information  from 
the  deeper  representations,  i.e.,  in  this  case  the  surface  structure  P  marker 
would  contain  an  indicator  as  to  whether  the  sentence  is  declarative  or  a  ques¬ 
tion. 


Young  and  Fallside  [YOUBO]  have  discussed  how  the  phonological 
marker  is  obtained  in  SSC.  Essentially  stress  information  is  derived  from  a  lexi¬ 
con  while  tone  group  boundaries  and  rhythm  and  intonation  patterns  are  deter¬ 
mined  algorithmically.  As  they  point  out  it  is  really  a  question  of  the  quality  of 
speech  desired.  For  poor  quality  speech  a  very  rough  and  approximate  transfor¬ 
mation  can  be  used,  for  high  quality  speech  the  transformation  must  be  more 
sophisticated. 


3.2.4  Phonological  Marker  to  Speech 

Recall  from  section  2.6,  output  transformations  are  device  depen¬ 
dant.  Here  we  will  assume  that  a  "parametric"  synthesizer  is  to  be  used. 

A  parametric  synthesizer  electronically  simulates  the  human  vocal 
apparatus.  It  consists  of  a  periodic  source,  a  noise  source,  and  a  filter  network. 
The  periodic  source  corresponds  to  the  human  vocal  cords  while  the  noise 
source  allows  one  to  produce  sounds  that  are  caused  by  turbulence  (these  are 
known  sis  "fricatives",  for  example  /s/).  The  signals  from  these  two  sources  are 
then  fed  into  a  filter  network  which  corresponds  to  the  vocal  tract.  The  vocal 
tract,  like  any  physical  body,  has  certain  resonant  frequencies  (called  "for¬ 
mants"  in  auditory  acoustics).  When  a  signal  is  passed  through  the  vocal  tract 
frequencies  near  the  formants  pass  easily  while  others  tend  to  be  damped  out. 
Thus  formants  play  an  important  role  in  determining  the  shape  of  the  accoustic 
waveform. 


A  parametric  synthesizer  allows  control  of  the  amplitude  of  the 
sources  as  well  as  the  filter  netv/ork.  For  example  the  following  eight  parame¬ 
ters  can  be  controlled  in  the  Computalker  CT-1  synthesizer  [SKEBO]: 

aO:  amplitude  of  periodic  source, 

fO:  frequency  of  periodic  source, 

af:  amplitude  of  noise, 

ff:  noise  resonance  frequency, 

fl-f3:  three  formant  frequencies, 

an:  nasal  branch  amplitude. 
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During  synthesis  it  is  necessary  to  update  these  parameters  approxi¬ 
mately  every  10  msec,  in  order  to  produce  comprehensible  speech  [FLA70]. 
Certain  parameters  can  be  derived  quite  easily  from  the  suprasegmental 
features,  for  example  fO,  the  ’’pitch”,  is  related  to  intonation,  and  the  amplitudes 
are  related  to  stress.  However  it  is  more  difficult  to  calculate  the  parameters, 
such  as  formant  frequencies,  that  are  related  to  segmental  features.  The  prob¬ 
lem  is  that  phonemes  are  idealizations  -  they  describe  the  sounds  produced 
when  the  position  of  the  vocal  tract  is  static.  Hence  in  order  to  avoid  choppy 
speech  it  is  necessary  to  calculate  the  formant  frequencies  at  intermediate 
vocal  tract  positions  during  transitions  from  one  phoneme  to  the  next.  This 
involves  solving  the  wave  equation  for  the  vocal  tract  at  each  of  these  intermedi¬ 
ate  positions. 

.4.S  difficult  as  the  above  may  sound  it  is  possible  to  accomplish  the 
task  in  real  time  using  hardware  no  more  powerful  than  an  LSI  11/23  [WIT79]. 
Also,  as  mentioned  in  section  3.1.4,  this  task  need  no  longer  be  left  to  software 
since  chips  containing  the  synthesis  algorithm  are  available. 

Returning  to  our  model  of  output  generation,  the  C-rulcs  for  this  out¬ 
put  transformation  are  ’'embedded”  in  the  synthesis  algorithm.  The  final  exter¬ 
nal  representation  is  a  string  of  sjmthesizer  parameters  which  are  physically 
realized  approximately  every  10  msec.  When  a  listner  perceives  this  physical 
realization  the  perception  is  that  of  speech. 


4  An  Application  to  Forms 

Many  of  the  experimental  Office  Information  Systems  that  have  been 
developed  are  form  oriented  ([CHESO],  [DEJSO],  [ELL80],  [HAM77]).  A  term  now 
in  vogue  when  describing  such  systems  is  "the  intelligent  form”.  Although  there 
is  no  precise  defimtion,  usually  such  a  form  is  one  for  which  the  system  does 
possess  some  semantic  knowledge.  In  present  systems  this  knowledge  is  almost 
exclusively  procedural.  For  example,  the  MIT  system  [ATT80],  allows  procedures 
to  be  attached  to  the  fields  of  a  form  ("automatic  fields").  When  the  form  is 
being  entered  these  procedures  are  activated  and  perform  such  actions  as 
automatically  filling  in  fields  or  validation  of  user  supplied  data.  Another  sys¬ 
tem,  TLA  [HOGBlJ,  also  provides  automatic  fields  as  well  as  more  general  "form 
procedures"  which  control  the  creation,  movement,  and  processing  of  forms. 

However  such  procedural  knowledge  does  not  give  the  complete 
semantics  of  a  form.  For  example,  knowing  that  field  A  is  the  sum  of  field  B  plus 
field  C  does  not  tell  us  whether  field  A  is  a  price  or  an  amount  or  whatever. 
Descriptive  knowledge  is  also  needed  to  specify  the  semantics  of  the  form.  This 
descriptive  knowTedge  can  be  stored  within  a  conceptual  representation  of  the 
form.  However,  If  such  knowledge  were  separatel3''  stored  for  ea.ch  instance  of 
the  form  the  amount  of  storage  required  would  be  excessive.  Fortunately  by 
exploiting  certain  features  of  forms  this  problem  may  be  avoided.  In  particular 
much  of  the  information  contained  on  a  form  is  redundant,  i.e.,  information  is 
often  repeated  except  with  minor  changes.  This  has  led  to  the  suggestion  of  pro¬ 
viding  "templates"  which  contain  the  redundant  information  and  slots  into  which 
particular  information  from  form  instances  can  be  inserted  [TSIBO].  An  cxamiplc 
is  the  graphic  template  (the  "form  blank”)  used  to  generate  graphic  displays  of 
forms. 
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Thus,  in  order  to  represent  descriptive  knowledge,  it  is  useful  to 
associate  "conceptual  templates"  with  forms.  These  templates  are  expressed 
using  the  conceptual  represent! on  of  the  OIS.  In  the  follo^ving,  several  examples 
of  conceptual  templates  will  be  given.  The  conceptual  representation  used  here 
is  based  upon  CD  theory  [2].  This  particular  repx  esenlaLion  is  used  since  it  lias 
already  been  discussed  in  this  paper,  in  practice  any  conceptual  or  meaning 
representation  should  suffice. 

Since  we  are  concerned  with  describing  the  semantics  of  forms  it  is 
necessary  to  choose  a  form  system  upon  which  to  base  our  discussion.  The  sys¬ 
tem  used  here  is  known  as  OFS  [CHEBO].  In  OFS  a  form  is  composed  of  a  number 
of  data  fields.  A  form  instance  is  a  set  of  values  for  these  fields.  There  are  six 
types  of  fields  in  OFS: 

KEY:  a  unique  identifier  for  the  instance. 

DATE:  date  when  instance  was  created, 

SIG;  a  user  identification, 

Ul;  a  field  which  must  be  given  a  value  when  the  instance  is 
created,  this  value  cannot  be  modified, 

U2:  a  field  which  need  not  be  given  a  value  at  form  creation, 
however  once  a  value  has  been  assigned  it  cannot  be 
changed, 

U3:  a  field  with  no  restrictions. 

Generally  values  for  the  first  three  field  types  are  supplied  by  the  system,  values 
for  the  last  three  types  by  the  user.  Any  of  the  user  fields  (i.e.,  Ul,  U2,  or  U3) 
may  have  an  attached  SIG  (signature)  field,  if  so  the  signature  field  is  assigned  a 
value  whenever  the  user  field  is. 

During  form  creation  the  KEY  and  DATE  fields  are  generated  by  the 
system,  also  the  user  must  give  a  value  for  each  Ul  field.  During  form 
modification  only  U3  or  unassigned  U2  fields  may  be  given  values.  Finally  there 
is  a  copy  operation  which  creates  a  new  form  instance  by  duplicating  ail  field 
values  from  the  original  (except  for  the  KEY  which  is  adjusted  to  indicate  a 
copy). 


It  is  these  rules  which  allow  the  user  to  infer,  if  possible,  who  sup¬ 
plied  a  field  value,  when  this  was  done,  and  how.  All  of  this  knowledge  can  be 
expressed  in  the  conceptual  template.  For  example  in  Figure  4.a  we  see  a  typi¬ 
cal  OFS  form  blank  to  which  the  names  of  the  form  fields  and  their  type  has 
been  added.  In  figure  4.b  is  part  of  the  conceptual  template  for  this  form.  Fig¬ 
ures  5. a  and  5.b  show  a  form  instance  and  how  it  may  be  merged  with  the  con¬ 
ceptual  template.  Before  discussing  these  figures  it  is  necessary  to  introduce 
the  notation  used  in  conceptual  templates: 

1)  Names  for  classes  of  objects  begin  with  a  single  upper  ease 
character.  For  example.  Order,  the  ciciss  of  order  forms. 

2)  Variables  which  refer  to  objects  are  represented  by  the 


[8]  The  more  standard  term  "attribute"  will  be  used  instead  of  the  CD  "state". 
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ciass  name  followed  by  an  integer. 

3)  An  unknown  member  of  a  class  is  represented  by  a  ”?”  fol¬ 
lowed  by  the  class  name  in  lower  case. 

4)  A  specific  member  of  a  class  is  represented  by  a  fol¬ 
lowed  by  the  class  name  in  lower  case,  followed  by  an  identify¬ 
ing  number.  For  example,  §0f  de7'41 ,  is  order  form  auiiibcr 
41. 

4)  CD  actions  and  attribute  names  are  in  full  upper  case. 

5)  Variables  which  refer  to  CD  attribute  values  appear  in 
lower  case. 

6)  Attribute  values  are  enclosed  in  single  quotes,  for  example 
’red’,  ’10  kg.’,  etc. 


Returning  to  Figure  4. a  and  4.b  we  see  that  the  conceptual  template 
contains  four  desses  of  objects:  order  forms,  people,  parts,  and  terminals.  The 
first  section  of  Figure  4.b  shows  the  correspondence  between  an  object’s  attri¬ 
bute  values  and  the  fields  of  a  form.  Flere  we  use  the  CD  "triple  arrow"  notation, 
i.e.,  object^^^attribute(value).  The  second  section  of  Figure  4.b  describes  how 
a  particular  field  (in  this  Ccise  the  part  number  field)  is  entered  on  the  form. 
There  would  be  one  such  construct  for  each  field  of  the  form.  For  example,  the 
conceptualization  produced  by  supplying  this  template  with  values  (see  Figure 
5.b)  could  generate  a  sentence  such  as;  "On  1/4/81  Mr.  Smith  entered  ’644-301’ 
for  the  part  number  field  by  typing  at  a  terminal." 

While  conceptualizations  such  as  this  answer  the  who,  when,  and  how 
questions  of  data  entry,  we  are  still  lacking  in  our  description.  We  have  not  yet 
expressed  how  the  difTerenl  form  fields  are  related  to  each  other.  Figure  6 
shows  the  section  of  the  conceptual  template  which  expresses  this  information. 
A  paraphrase  of  this  figure  is:  "Person  1  has  communicated  to  PersonS  that  the 
approval  of  Orderl  will  enable  the  transfer  of  possession  of  Parti  to  the  XYZ 
Company,  resulting  in  the  stock  level  for  this  part  being  incremented  by  qty." 
This  is  quite  a  complex  piece  of  information  but  it  does  seem  to  capture  the 
meaning  of  the  form. 

If  conceptual  templates  are  used  then  the  output  generator  must 
flisL  merge  a  form  instance  with  the  template.  In  the  example  above  this 
involved  merely  filling  in  the  slots  of  the  conceptual  template.  We  can  construct 
a  more  versatile  template  by  attaching  predicates  to  the  conceptualizations.  If 
a  form  instance  satisfies  the  predicate  then  the  associated  conceptualization  is 
"activated"  (i.e.,  filled  in).  Thus  we  can  imagine  situations  in  which  an  form  ori¬ 
ginal  would  activate  one  group  of  conceptualizations  and  a  copy  another.  Once 
the  conceptualizations  have  been  selected  from  the  template  and  merged  with 
the  form  instance  an  external  representation  may  be  generated  using  the  previ¬ 
ously  described  methods. 
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Figure  6 


5  Conclusion 


This  paper  has  presented  a  model  for  an  output  generator  that  may 
be  of  use  to  an  OIS.  This  model  has  been  used  in  systems,  such  as  SSC,  that  are 
presently  in  operation.  The  main  advantage  of  the  model  is  that  it  allovs  flexibil¬ 
ity  in  the  external  representation  of  inrorniation  without  altering  the  internal 
machine  representation. 
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Abstract 

Microcomputer  technology  has  provided  the  means  to  produce  desktop  sized  dev¬ 
ices  with  substantial  computing  capabilities.  This  paper  explores  microcomputer  tech¬ 
nology  as  it  exists  today  and  proposes  an  architecture  that  is  designed  to  satisfy  the 
electronic  needs  and  requirements  found  in  the  corporate  office.  In  addition  to  han¬ 
dling  commonplace  electr  onic  office  applications  such  as:  word  processing,  electronic 
mail  and  electronic  filing;  particular  design  emphasis  is  placed  on  the  need  to  perform 
as  a  rugged  multi-purpose  tool  that  supports  general  office  work.  A  family  of  compati¬ 
ble  workstations  is  proposed  that  range  from  simple  low  cost  systems  that  support  an 
inexpensive  video  display  and  telephone  device  to  high  performance  systems  that  have 
extensive  storage  and  graphics  support. 
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1.  Corporate  Needs  and  Requirements. 

An  office  is  a  loosely  structured  environment  that  is  responsible  for  the  productive 
coordination  of  events  and  resources  in  the  corporation  [\(0RG76].  Office  procedures 
can  be  viewed  as  interconnected  event  driven  processes  that  often  have  a  repeatable 
nature  but.  retain  an  inherent  degree  of  flexibility.  To  a  large  extent  the  tasks  of;  com¬ 
munications,  form  handling,  forms  processing  and  internal  correspondance  dominate 
the  spectrum  of  activites  found  in  the  office  [ELLI79j.  These  activities  represent  the 
"typical'’  office  environment.  Office  activities  are  further  characterized  by  their 
dynamic  nature  and  state  of  constant  flux.  Although  many  activities  and  events  in  the 
office  are  repeated;  an  office  needs  to  dynamically  adapt  and  nope  with  exceptional 
conditions  on  a  regular  basis.  These  demanding  requirements  make  the  use  of  "tradi¬ 
tional"  data  processing  techniques  and  equipment  undesirable. 

Offices  cope  with  a  wide  spectrum  of  data  ranging  from  infuirnal  scribbles  and 
memos  to,  complex  multi-page  legal  documents  that  require  precise  wording.  In  addi¬ 
tion,  the  tasks  and  activities  that  are  found  in  the  office  share  an  equally  diverse  spec¬ 
trum.  Clearly,  the  corporate  office  offers  a  rich,  multi-faceted  environment.  The 
design  and  construction  of  an  effective  office  workstation  is  an  imposing  task  because 
of  the  diverse  applications  it  must  support.  Scribbled  notes  and  tone  of  voice  are  often 
as  importeuit  as  official  documents.  It  is  important  to  support  this  wide  variety  of 
information  representation  by  supporting  a  nurn'oer  of  di/IereriL  interfaces  such  as: 
graphical  input  output,  sophisticated  hardcopy  devices,  voice  capability,  and  an 
effective  communications  network.  Flexibility  is  not  only  important  in  terms  of  infor¬ 
mation  representation  but,  it  is  important  in  terms  of  the  office  activities  that  mani¬ 
pulate  the  information. 

Offices  are  dynamic  changing  entities.  Often,  they  are  driven  by  exception  rather 
than  rule.  In  many  competative  situations,  a  corporation's  profit  and  loss  figures  are 
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directly  based  on  how  quickly  the  corporation  can  adopt  to  rapid  change.  This  is  par¬ 
ticularly  difficult  for  many  corporations  (particularly  some  large  ones)  since  they  are  a 
product  of  historical  evolution  and  adhere  to  time  honoured  procedures  either  for  his¬ 
torical  reasons  or,  for  a  lack  of  understanding  of  a  more  global  corporate  perspective. 

There  is  a  need  for  a  more  formal  method  of  defining  office  procedures  [ZISM77]. 
However,  there  is  a  fundamental  tradeoff  involved  in  establishing  formal  procedures. 
The  more  formal  the  procedure  is,  the  more  detailed  the  specifications  for  the  pro¬ 
cedure  become.  If  a  particxiiar  procedure  is  described  in  great  detail  one  loses  a 
degree  of  flexibility  in  the  manner  that  the  procedure  is  used.  Therefore  a  high  level 
procedure  specification  method  that  retains  a  degree  of  flexibility  is  very  important. 
Equally  important  is  the  need  to  retain  flexibility  in  information  representation. 

Hiiman  factors  play  an  an  intrinsic  role  in  the  overall  system  design.  A  typical 
employee  will  interact  with  a  workstation  for  a  good  portion  of  the  working  day.  Thus 
maintaining  interest  and  reducing  fatigue  are  key  issues  in  the  workstation  design. 
The  use  of  non  glare,  high  resolution  displays,  for  example,  can  go  a  long  way  to  reduc¬ 
ing  eye  fatigue.  FriendhT-  and  natural  interfaces  are  also  important  to  eliminate  poten¬ 
tial  sources  of  frustration.  A  good  human  engineering  philosopy  is  to  adapt  the 
machine  to  man,  rather  than  have  man  adapt  to  the  machine. 

Many  applications  demand  a  fast  response  time  and  will  therefore  require  a  con¬ 
siderable  computing  capacity.  In  these  applications,  a  fast  response  to  all  but  the  larg¬ 
est  tasks  is  important  to  keep  a  user’s  attention  focused  on  the  task  at  hand.  When¬ 
ever  the  response  time  exceeds  a  particular  elapsed  time,  (on  the  order  of  three 
seconds)  the  user’s  mind  naturally  wanders  from  the  task  at  hand.  As  a  consequence, 
this  interruption  in  thinking  not  only  reduces  productivity  but,  it  increases  the  chance 


for  user  error. 
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Concurrent  microprocessor  technology  can  provide  the  requisite  computing  power 
needed  to  support  a  fast  response  time.  A  display  pr  ocessor-  coinbiiied  vjrith  wide  data 
paths  to  both  memory  and  secondary  storage  nan  as.cinre  that  the  user  gets  a  fast 
response. 

Reliability  is  often  neglected  when  hardware  is  designed.  The  workstation  v,ill  be 
the  single  most  complex  piece  of  equipment  in  the  office.  The  sheer  complexity  of  tiiis 
equipment  makes  it  prone  to  failure.  Since  a  workstation  is  expected  to  dominate  a 
large  number  of  office  tasks,  it  needs  to  be  the  single  most  reliable  piece  of  equipment 
in  the  office.  Hardware  reliability  begins  at  the  design  board  and  continues  through 
the  entire  lifetime  of  the  cquipmicnt.  In  this  area,  mmch  can  be  learned  from  the 
design  of  equipment  for  military  use.  Here,  equipment  is  designed  to  be  abused  rather 
than  used.  Although  the  corporate  office  is  far  removed  from  arctic  tundra  and 
steaming  tropical  swamps,  the  strategy  to  design  fur-  abuse  is  still  sound.  This  strategy 
requires  the  use  of  premium  quality  components  that  are  housed  in  reinforced,  sealed 
cabinets  that  are  designed  for  use  in  unpredictable  environments. 

These  measures  wdll  insure  that  an  inadvertent  coffee  spill  will  not  be  a  costly  mis¬ 
take.  In  addition,  one  might  even  go  so  far  as  to  consider  using  a  heal,  conducting  gas 
such  as  helium  that  reduces  the  thermal  stress  on  all  the  components  within  the  sys¬ 
tem.  This  can  contribute  significantly  to  the  overall  reliability  of  the  system. 

Reliability  can  be  increased  at  the  manufacturing  level  by  the  use  of  an  extensive 
"burn  m".  It  is  well  established  that  marginal  electronic  components  are  likely  to  fail 
in  the  first  year  of  service.  A  "burn  in"  is  an  accelerated  test  of  the  workstation  under 
harsh  conditions  that  induces  failure  of  marginal  components  before  they  leave  the 
factory. 
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Moving  head  disks  should  be  sealed  and  use  a  oonservaMve  long  life  design  that, 
minimizes  the  number  of  m-oving  oom.ponents.  This  strategy  can  greatly  reduce  the 
probability  of  data  loss  due  to  equipment  failure. 

These  measures  outlined  above  add  a  considerable  cost  to  the  purchase  price  of 
the  equipment.  However,  many  other  costs  are  reduced.  Some  of  these  costs  are: 

1)  Maintenance  of  the  equipment  is  reduced. 

2)  Cost  of  data  recovery  and  restoration  are  reduced. 

3)  Lost  productivity  time  is  reduced. 

4)  Equipment  replacement  costs  ai'e  reduced. 

These  considerations  must  be  evaluated  before  the  equipment  is  purchased. 

This  section  has  outlined  the  some  of  the  design  philosophy  intended  for  the  office 
workstation.  In  addition,  some  of  the  needs  of  the  corpo  r*{At»ion  djiw  the  individual  have 
been  discussed.  The  next  section  focuses  on  the  technologies  that  are  required  to 
assemble  a  workstation. 

2.  A  Review  of  Teclmology. 

Microcomputer  technology  makes  the  construction  of  the  independent  worksta¬ 
tion  a  reality.  Today  complex  systems  that  support  concurrency  and  large  memory 
spaces  are  easily  assembled  with  off  the  shelf  components,  b'ixteen  bit  microcomput¬ 
ers  and  64  kilobit  solid  state  memories  are  available  in  production  quantites.  It  is 
expected  that  this  technology  will  form  the  backbone  of  the  workstation  family.  High 
performance  versions  will  support  a  bit  driven  graphics  display  teamed  with  a  bit 
driven  hardcopy  device  trial  pr  oduees  graphie  output  on  plain  bond  paper. 

Mass  storage  is  still  most  economically  handled  by  moving  head  disks  which  will 
probably  be  around  for  another  decade.  Although  this  technology  is  w’^ell  seasoned,  it 
can  accomodate  at  least  another  order  of  magnitude  of  density  improvement. 
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The  technologies  for  building  an  office  workstation  are  here.  However,  the  family 
of  workstations  is  required  to  support  a  wide  spectrum  of  information  representation 
and  task  diversity.  To  provide  an  effective  solution  to  this  problem,  the  components 
and  subsystems  that  comprise  each  workstation  need  to  be  well  matched.  The  next 
sections  highlight  some  aspects  of  the  individual  teclmologies  that  are  important  in 
constructing  a  family  of  vvorkstations. 

2.1.  Computing  Power 

The  hardware  designer  of  the  1980’s  has  a  large  repertoire  of  high  performance 
components  that  can  be  easily  configured  as  a  central  processor.  Low  power,  high  per¬ 
formance  microcomputer  products  are  ideal  for  the  construction  of  a  workstation  that 
needs  a  self  contained  processor.  Powerful  16  bit  microcomputer  products  can  easily 
support  large  and  complex  office  applications  that  require  a  fast  response  time. 

The  self-contained  office  workstation  configuration  can  make  use  of  a  number  of 
microcomputers  with  each  microcomputer  dedicated  to  a  specific  task  such  as:  com¬ 
munications.  display  and  disk  functions.  The  proper  use  of  this  strategy  can  free  the 
general  purpose  processor  from  many  time  consuming  housekeeping  functions. 
Although  the  16  bit  microcomputers  pack  substantial  power  in  a  compact  package, 
some  applications  require  the  use  of  a  mainframe  computing  device.  In  this  case,  a 
high  performance  computer  can  be  made  available  for  use  by  a  workstation  through  a 
communications  network.  These  computers  will  need  to  be  centrally  located  since 
they  still  have  extensive  power  and  cooling  requirements.  Because  of  the  high  costs 
involved,  these  devices  need  to  be  be  shared  by  the  user  community.  It  is  anticipated 
though,  that  only  the  laigest  tasks  will  require  a  machine  of  this  capability. 

Another  family  of  computer  components  that  warrents  further  investigation  are 
known  as  bit-slice  devices.  A  bit-slice  computer  is  typically  faster  than  its  single  chip 
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counterpart  but,  it  is  considerably  slower  than  most  mainframe  computers.  Although 
the  power  requirements  of  bit  slice  devices  are  more  than  single  chip  microcomputers, 
they  require  far  less  power  than  a  traditional  mainframe.  A  bit  slice  architecture,  if 
used  to  advantage,  makes  it  easier  to  ensure  compatibility  across  a  family  of  machines 
since  the  hardware  designer  is  responsible  for  defining  the  instruction  set  of  the 
machine.  Bit  slice  devices  have  another  important  feature  -they  are  easily  cascaded  to 
form  a  processor  having  an  arbitrary  wordlength.  Office  tasks  are  often  characterized 
by  data  movement  rather  than  computation.  To  achieve  a  high  level  of  performance  a 
data  path  on  the  order  of  100  bits  can  be  used.  In  an  environment  dominated  by  data 
transfers  a  real  gain  in  throughput  can  be  easily  realized. 

3.  System  Bus 

There  is  a  need  for  a  common  system  bus  for  the  entire  family  of  workstation  dev¬ 
ices.  This  ensures  hardware  compatibility  between  all  configurations  of  workstations. 
A  common  wide  system  bus  with  provisions  for  multiple  bus  masters  provide  ample 
data  paths  needed  for  maintaining  a  constantly  updating  graphics  screen  and  still 
maintain  a  fast  response  for  the  user.  The  wide  system  bus  also  makes  it  easier  to 
accomodate  additional  devices  when  the  need  8u*ises. 

4.  Solid  State  Memories. 

Each  workstation  configuration  will  have  its  own  memory  requirments.  Modest 
configurations  will  have  meager  requirements  while,  high  performance  devices  will 
probably  support  large  address  spaces.  Today,  low  cost,  large  memories  on  a  single 
chip  are  available  in  production  quantities.  Like  all  memory  devices,  semiconductor 
devices  are  prone  lo  interrnilLent  errors  as  w^ell  as  the  occasional  failure.  The  damage 
caused  by  these  failures  can  be  minimized  through  the  use  of  standard  error  correct¬ 
ing  hardware  that  should  be  incorporated  into  the  memory  boards. 
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4.1.  Bubble  Memories 

Bubble  memories  present  a  v.'-orkable  aiternative  to  disk  storage.  They  exhibit 
many  of  the  properties  of  disk  storage  devices  (access  times,  transfer  rates  etc.).  But, 
unlike  disks,  they  are  inherently  more  reliable  since  there  arc  no  moving  parts  to  fail. 
It  is  unfortunate  that  these  devices  are  not  yet  cost  effective  when  compared  to  con¬ 
ventional  disks.  However,  as  production  of  these  devices  increases  there  will  be  uraaLic 
drops  in  the  price  of  these  rnemories.  Exactly  how  fast  and  how  far  this  price  will  drop 
is  not  yet  known.  At  the  present  time  it  would  be  .  prudent  to  prc'dde  the  proper 
"hooks”  in  the  workstation  to  accomodate  this  technology  when  it  matures. 

4.2.  Moving  Head  Disks. 

Moving  head  disk  technology  currently  represents  the  most  cost  effective  means 
for  storing  large  files  that  require  random  access.  They  represent  a  proven  technology 
that  still  has  room  for  considerable  growth  both  in  performance  and  capacity.  Over 
the  years,  there  has  been  a  steady  decline  in  cost,  versiis  performance.  Today  it  is  pos¬ 
sible  to  purchase  an  80  megabyte  disk  drive  in  OEM  quantities  at  an  attractive  price. 

A  sealed  and  ruggedized  version  of  the  80  megabyte  disk  is  proposed  for  the 
workstations  that  require  secendarv  slui  age.  For  Liiose  workstations  that  demand  high 
performance,  further  modifications  are  proposed  to  the  disk  to  provide  an  independent 
data  path  from  each  head  of  the  disk  to  the  system  bus.  By  having  distinct  paths  from 
each  disk  head  to  the  rest  of  the  system,  an  increase  in  the  bandwidth  of  the  disk  dev¬ 
ice  is  realized.  A  high  bandwidth  is  especially  important  for  search  and  retrieval  opera¬ 
tions  and,  to  support  a  graphics  display  device.  For  an  additional  expense,  to  increase 
the  bandwidth  even  further,  it  may  be  appropriate  to  have  a  second  independent  arm 
built  into  the  disk  unit.  This  would  offer  several  advantages.  First,  many  disk  opera¬ 
tions  consist  of  read  followed  by  write  operations  on  different  areas  of  the  same  disk. 
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Having  two  independent  disk  arms  could  substantially  reduce  the  seek  times  involved 
with  this  very  common  operation.  The  separate  arms  can  he  used  to  advantage  in  a 
wide  range  of  operations  that  involve  large  files  by  applying  some  simple  prefetch 
schemes. 

The  techniques  discussed  thus  far  provide  a  high  bandwidth  from  the  secondary 
storage  to  the  other  devices  on  the  system  bus.  However,  the  high  bandwidth  alone 
cannot  substantially  reduce  access  times  unless  processing  capabilites  are  added  to 
support  the  fast  physical  access  paths. 

Associated  with  each  head  can  be  track  information  processor  (TIP)  that  can  be 
programed  to  search  and  update  an  entire  track  in  a  single  disk  revolution  [HSIA80]. 
The  set  of  TIPs  are  connected  to  a  smart  controller  that  contains  three  major  com¬ 
ponents:  a  switch,  a  processor  and,  a  secondary  storage  map. 

The  switch  mechanism  lies  between  the  system  bus  and  the  TIPs  and  is  under  con¬ 
trol  of  the  processor.  It  can  be  used  to  advantage  in  several  applications.  A  read/write 
operation  on  the  same  disk  can  proceed  directly  without  loading  the  system  bus  by 
using  the  switch  to  route  one  TIP’s  output  to  another  TIP’s  input.  Similarly,  it  is  possi¬ 
ble  to  broadcast  the  output  of  one  TIP  across  the  entire  width  of  the  system  bus 
(approximately  100  bits).  The  switch  meehanism  can  be  an  aid  in  handling  two 
independent  disk  arms.  Here  it  is  quite  possible  to  preposition  one  arm  while  the  other 
is  performing  a  desired  operation.  When  the  data  from  the  first  arm  has  been  pro¬ 
cessed.  the  second  arm  is  already  in  position.  The  switch  routes  the  proper  TIP  to  the 
required  segment  of  the  system  bus. 

The  disk  processor  coordinates  the  activites  of  both  the  switch  and  the  TIPs.  It  is 
also  responsible  for  housekeeping  operations  that  handle  errors  and  arm  positioning. 
In  addition  to  the  mundane  functions  mentioned  above,  the  processor  accesses  the 
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memory  map  which  contains  an  outline  of  the  structure  of  the  entire  disk.  Depending 
on  the  size  of  the  structure  memory  it  can  simply  contain  a  directory  of  files  or.  it  can 
contain  additional  templates  which  describe  the  internal  contents  and  structure  of  the 
file.  The  processor  can  use  this  information  and  instruct  the  TIPs  how  to  efficiently 


To  System  Sus 


The  Structure  of  a  Smart  Disk. 

4.3.  Display 

High  resolution  bit  mapped  displays  are  well  suited  to  handle  the  rich  variety  of 
data  representations  that  are  found  in  the  office.  This  display  has  the  ability  to  gen¬ 
erate  very  small  dots  in  arbitrary  patterns.  The  size  of  the  individuai  dots  is  so  small 
that  when  they  are  put  together  to  compose  a  particular  symbol  or  character,  they 
give  the  viewer  the  appearance  of  a  solid  character  that  does  not  appear  to  be  com- 
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posed  from  small  dots.  A  resolution  of  roughly  300  bits  per  inch  is  needed  to  provide 
this  capability.  This  is  well  within  the  capabilities  of  black  and  white  video  technology 
but  is  still  on  the  horizon  with  colour  technology. 

A  bit  mapped  display  with  a  300  bit  per  inch  resolution  requires  considerable 
storage  and,  tightly  timed  high  performance  logic.  These  commodities  are  expensive. 
Thus  the  cost  of  this  display  can  be  easily  three  to  five  times  the  cost  of  a  conventional 
display  device.  Also,  the  use  of  colour  should  be  carefully  considered  since  it 
represents  a  very  natural  method  of  highlighting  importaiit  aieas  of  the  screen.  Today, 
high  resolution  in  colour  displays  is  limited  by  the  presence  of  a  shadow  mask  located 
between  the  electron  gun  and  the  screen  phosphor.  It  is  possible  to  make  a  high  reso¬ 
lution  colour  display  device  without  the  use  of  a  shadow  mask.  One  technique  that 
may  be  used  to  acheive  this  goal  is  to  construct  a  special  three  layer  phosphor  screen 
with  each  colour  phosphor  sensitive  to  a  specific  electron  energy  level.  Different 
colours  arc  generated  by  varying  the  velocity  of  the  electron  beam.  This  strategy 
represents  a  good  compromise  solution  since  it  provides  the  required  resolution  as  well 
a  useful  colour  capability. 

Since  the  display  is  intended  for  office  use.  an  8.5"  by  11"  viewing  area  is  manda¬ 
tory.  The  display  should  also  have  the  ability  to  accomodate  both  vertical  and  horizon¬ 
tal  formats.  The  graphics  ability  inherent  in  the  bit  mapped  display  can  handle  a 
diverse  range  of  office  applicalious.  For  example,  vaxious  type  lurits  m  vai'ious  type 
sizes  arc  easily  handled  by  this  display.  The  graphics  display  is  naturally  teamed  with 
a  digitizing  tablet  to  allow  figures  and  diagrams  to  be  easily  manipulated. 

If  possible,  the  electronics  used  to  support  the  display  device  should  be  designed 
to  handle  a  wide  range  of  video  displays.  Thus  applications  that  have  modest  require¬ 
ments  (or,  a  very  limited  budget)  can  use  a  ver)^  inexpensive  display  device,  which  can 
be  easily  upgraded  when  required. 
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4.4.  Hardcopy 

Although  the  proper  use  of  display  devices  goes  a  long  waj?'  to  eiinunating  the  need 
for  paper  in  the  office:  a  requirement  for  hardcopy  still  exists.  Since  some  versions  of 
the  workstation  support  a  graphics  capability,  an  ordinary  line  printer  is  not  well 
suited  for  use  with  these  architectures.  However,  a  new  breed  of  printing  devices  have 
recently  appeared  on  the  marketplace  that  can  be  easily  teamed  with  the  graphics 
display  [G00D80].  These  devices  are  bit  driven  elecLiustatic  pi'inting  devices  similar'  in 
size  and  operation  to  an  ordinary  office  photocopier.  Although  these  devices  are  still 
very  new  they  provide  an  ideal  support  tor  an  electronic  office  information  system. 

Although  their  current  purchase  price  is  high,  it  is  t:xpected  tiiat  the  workstation 
will  reduce  the  need  for  paper  in  the.  office.  It  is  possible  that  one  hardcopy  device  can 
be  shared  between  several  workstations  that  are  in  close  physical  proximity  of  each 
other. 

4.5.  Cominunicaiiuus 

Communications  is  a  vital  office  activity.  Much  office  time  is  spent  in  handling 
documents  that  are  received,  processed  and  subsequently  forwarded.  An  effective 
communications  scheme  is  intrinsic  to  the  overall  design  of  the  fasiui>  of  workstations 
and  to  the  office  itself.  Many  possible  communications  configurations  can  be  con¬ 
sidered  for  a  particular  installation.  For  digital  communications,  several  packet 
switching  schemes  have  been  devised.  To  handle  voice  communications  over  tele¬ 
phones  several  digital  exchanges  are  available  to  satisfy  most  office  needs  and  require¬ 
ments.  However,  before  a  specific  communications  plan  is  proposed,  the  communica¬ 
tions  requirements  of  the  office  need  to  be  assesed. 

A  typicsd  office  uses  two  modes  of  communications:  verbal  and  written  communi¬ 
cations.  Voice  communications  are  handled  in  person  or,  through  the  telephone  which 
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uses  a  central  office  to  accomplish  the  routing  of  calls.  Currently,  a  corporation’s  ■writ¬ 
ten  correspondance  is  handled  maually.  In  large  organizations  a  special  mail  depart¬ 
ment  handles  this  task. 

A  versatile  office  workstation  needs  to  handle  both  digital  and  voice  data.  A  high 
per  formance  digital  telephone  exchange  that  also  supports  digital  packets  cem  handle 
both  forms  of  data  in  a  competent  fashion. 

This  central  exchange  can  provide  other  functions  as  well.  Because  the  exchange 
is  central  facility  it  can  provide  efficient  access  to  a  large  central  mainframe  com¬ 
puter.  The  exchange  can  also  support  a  centralized  file  server  that  can  reduce  the 
need  for  extensive  duplication  of  data.  An  enhanced  digitial  switching  exchange  is  a 
prime  candidate  for  supporting  an  office  workstation  for  the  follo'vving  reasons: 

1)  The  exchange  has  the  necessary  bandwidth  needed  to  accomodate  hundreds  of 
simnltaneoTJs  voice  and  digital  transmissions.  Traditional  packet  switched  netw^orks  of 
both  the  store  and  forward  variety  as  well  as  the  broadcast  type  lack  the  requisite 
bandwidth  and  guaranteed  delivery  time  required  by  this  application. 

2)  Many  corporations  have  evolved  around  a  centralized  mainframe  and  have  a 
large  investment  made  in  the  systems  this  equipment  supports.  A  centralized 
exchange  can  be  used  to  advantage  in  providing  the  means  to  disseminate  this  central¬ 
ized  data  to  the  v^arious  workstations. 

3)  A  central  s-witching  exchange  can  have  specialized  software  that  understands  a 
number  of  different  net'work  protocols  and  can  optimize  and  oversee  the  routing  of 
data  over  several  diverse  networks. 

4)  The  centralized  s'^vitching  facility  can  also  provide  file  server  functions  and 
thereby  reduce  the  need  for  individual  disk  units.  The  file  server  can  support  small 
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workstation  configurations  that  do  not  have  their  own  secondary  storage  devices.  Also, 
the  file  server  can  be  used  to  store  data  common  to  several  workstations.  The  central 
exchange  can  also  have  a  number  of  general  purpose  processors  which  can  be  available 
to  provide  functionality  to  workstations  needing  additional  support.  One  nice  feature 
of  such  a  system  is  that  for  a  modest  investment  a  user  has  access  to  a  large  volume  of 
data. 


The  Central  Fxchant^e  Por  phone  and 
support . 


works  tat  Ion 
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4.6.  Voice  Commumcations 

Adding  a  voice  capability  to  the  workstation  provides  the  means  to  intermix  voice 
with  written  data.  This  represents  a  new  and  unique  capability  that  has  not  yet  been 
explored.  Voice  communications  are  appealing  because  they  are  a  natural  and  spon¬ 
taneous  method  of  communications  which  convey  a  sense  of  emotion  that  is  lost  when 
written  down. 

Converting  voice  data  to  a  digital  bit  stream  is  easily  implemented  using  CODEC 
(coder-decoder)  components.  The  same  hardware  can  also  be  used  to  synthesize 
human  speech.  In  other  words,  the  workstation  has  the  ability  to  listen  and  talk.  This 
feature  can  be  used  to  direct  a  worker’s  attention  without  using  the  display  or,  it  can 
be  used  as  an  alternative  output  facility.  Although  the  task  of  general  voice:  recogni¬ 
tion  is  beyond  current  technology  -a  limited  number  of  short  commands  can  be  recog¬ 
nized  by  using  this  technology.  Commands  such  as  '‘up”,  and  “down"  ar  e  well  within  the 
capabiUties  of  the  proposed  hardware. 

The  hardware  voice  capability  can  add  a  number  of  functions  not  currently  possi¬ 
ble  with  an  ordinary  desktop  phone.  Office  procedures  will  evolve  to  take  advantage  of 
the  built  in  voice  capability.  It  is  conceivable  that  office  employees  will  use  this 
hardware  to  attach  voice  instructions  to  supplement  written  data.  Voice  instructions 
may  be  filed  and  retrieved  in  the  same  manner  that  written  data  is  currently  accessed. 

5.  Other  Peripherals. 

The  hardware  discussed  above  constitutes  the  major  components  of  the  used  in 
the  fabrication  of  the  family  workstations  that  can  be  constructed  with  ofi  the  shelf 
components.  One  major  drawback  of  the  proposed  workstations  is  that  most  of  the 
interaction  occurs  between  the  screen  and  the  keyboard  of  each  workstation.  Even  a 
high  resolution  display  is  restrictive  in  the  sense  that  only  a  single  screen  of 
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information  can  be  displayed  at  any  given  time.  Many  people  work  at  their  desks  with 
twenty  or  thirty  documents  scattered  over  the  entire  desktop.  (It  is  not  clear  that  hav¬ 
ing  thirty  documents  scattered  over  a  desk  reflects  an  organized  and  productive  work 
strategy.  It  is  clear  though  that  the  data  on  a  single  screen  is  too  restrictive.)  For  this 
reason,  there  is  a  need  for  other  devices  that  can  distribute  the  information  and  func¬ 
tionality  of  a  workstation  over  a  desktop. 

Conventional  office  desks  contain  a  number  of  aids  that  facilitate  office  work. 
Phone  indexes,  electronic  calculators,  and  calendars  are  all  important  aids  that 
enhance  office  productivity.  A  nice  feature  of  these  devices  is  that  they  spread  the 
information  and  capability  they  provide  across  an  entire  desktop  instead  of  concen¬ 
trating  it  at  a  single  physical  location. 

A  user  has  the  ability  to  place  these  devices  an>'where  on  the  desktop  and  they 
may  be  moved  around  as  needed.  Having  a  number  of  these  aids  can,  to  some  extent, 
help  support  a  number  of  simultaneous  tasks.  It  is  our  philosophy  that  the  use  of  these 
portable  desktop  aids  is  to  be  encouraged. 

Electronic  counterparts  of  these  devices  Eure,  for  the  most  part,  easy  to  construct. 
The  operation  of  these  devices  can  be  easily  handled  by  the  workstation.  Unlike  their 
conventional  counterparts,  these  new  devices  can  show  more  flexibility  and  versatility 
since  they  have  the  ability  to  be  programmed.  The  following  list  of  devices  (this  list  is 
by  no  means  exhaustive)  can  be  considered  for  the  electronic  desktop. 

5.1.  Frame  of  Reference  Blotter 

This  device  is  a  specialized  digitizing  tablet  that  physically  resembles  an  ordinary 
desktop  blotter.  The  resemblence  to  a  blotting  pad  ends  here.  The  user  can  place  a 
sheet  of  paper  on  top  of  the  blotter  and  write  on  the  sheet.  The  blotter  digitizes  any 
written  information  that  is  placed  on  the  page.  The  outline  of  the  page  is  also  recorded 
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and,  as  a  page  is  moved  over  the  blotter  the  workstation  maintains  the  proper  orienta¬ 
tion  of  the  document.  The  user  is  free  to  fill  in  fields  and  oonslruet  diagrams  or  make 
any  desired  annotation.  Any  input,  with  the  exception  of  very  fine  detail,  is  digitized 
and  is  available  to  the  workstation  for  processing  in  a  variety  of  ways. 

5.2.  Small  I/O  Devices. 

These  devices  consist  of  a  small  display,  a  number  of  input  keys  and,  possibly  a 
joystick  housed  in  a  small  box  that  can  communicate  directly  with  a  workstation. 
Because  the  operation  of  these  devices  is  under  the  control  of  the  workstation  they 
may  then  be  programmed  to  support  a  variety  of  functions.  Electronic  calculators, 
appointment  reminders,  and  phone  indexes  are  some  typical  functions  that  can  be  sup¬ 
ported  by  these  devices.  Since  these  devices  are  programmed  they  can  provide  more 
functions  and  error  checking  than  their  conventional  counterparts. 

5.3.  Port.ablc  Work.stjil.ion 

Although  it  may  not  be  possible  to  take  an  entire  workstation  on  a  business  trip,  it 
would  be  nice  to  have  a  small  but  compatible  workstation  that  can  be  taken  outside  the 
office.  Before  the  workstation  is  moved  it  is  plugged  into  the  portable  workstation 
which  collects  the  information  it  needs.  This  data  is  now  portable  and  may  be  pro¬ 
cessed  in  the  field.  When  the  workstation  comes  back  to  the  office  its  contents  are 
used  to  update  the  data  held  in  the  original  workstation. 

Physically,  this  workstation  resembles  one  of  the  1/0  devire.s  previously  discussed. 
However,  it  would  have  a  fnil  keyboard,  come  with  a  reasonable  display  device  and 
have  both  memoi'y  and  processing  capabilities.  Space  and  power  requirements  are  key 
concerns  in  the  design  of  this  hardware.  This  workstation  would  have  to  easily  fit  mside 
a  briefcase  and  operate  at  least  a  full  day  on  its  own  power.  It  would  also  have  to  be 
very  rugged  in  order  to  withstand  day  to  day  use  in  unpredictable  environments. 
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CUOS  semicondiiotor  technology  would  be  an  ic  eal  candidate  for  the  logic  com¬ 
ponents  of  this  device.  This  technology  is  well  known  for  its  meager  power  require¬ 
ments  Euid  high  noise  immunity.  Bubble  memory  technology  should  be  considered  for 
providing  secondary  storage  since  bubble  de\'ices  are  compact,  have  rin  moving  parts 
and  represent  a  non  volatile  form  of  storage. 

The  only  problem  in  constructing  this  workstation  is  the  output  device.  LCD 
displays  are  ideal  tor  low  power  consumption  but.  up  to  now,  they  have  not  shown  much 
promise  in  large  multiplexed  displays.  Video  technology  is  mpre  suited  for  handling 
the  amount  of  information  that  needs  to  be  displayed.  However,  it  too  is  not  without 
drawbacks.  Power  consumption  and  delicate  construction  are  real  concerns  when 
using  this  technology  in  a  portable  workstation.  Flat  panel  plasma  displays  are  also  a 
candidate  to  consider  here  as  well. 

A  promising  commercial  product  that  is  constructed  with  portability  in  mind  is 
the  SHARP  1211,  a  pocket  sized  computer  that  can  be  programmed  in  Basic.  It  has  a 
liquid  crystal  display,  two  thousand  words  of  memory,  a  Basic  interpreter  and  a  pro¬ 
gram  editor.  A  user  can  optionally  add  a  cassette  recorder  interface.  Today  this  pro¬ 
duct  is  sold  in  retail  stores  for  less  than  $300.  This  machine  is  an  example  of  what  can 
be  currently  fabricated  using  today’s  technology.  A  larger  configuration  using  the 
same  implementation  strategy  can  accomodate  a  number  of  office  tasks  and  can  be 
the  basis  of  a  portable  workstation. 

6.  Putting  the  Pieces  Together. 

Traditional  computing  systems  have  evolved  along  two  paLlis.  The  first  path  is  the 
mainframe  computer  approach.  In  this  setup  a  large  user  community  shares  a  single 
large  computer  resource.  This  approach  has  a  number  of  advantages  and  disadvan¬ 
tages.  An  advantage  here  is  that  the  user  pays  only  for  the  re.sources  that  are  actually 
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used.  On  the  other  hand,  the  user  of  a  central  system  has  increased  difficulty  in  main¬ 
taining  the  security  of  data  and,  cannot  predict  the  response  time  of  the  system  since 
it  depends  on  the  current  computing  activites  of  the  entire  user  community. 

Another  more  recent  approach  to  computing  is  through  the  use  of  personal  com¬ 
puters.  Again,  these  systems  have  specific  advantages  and  disadvantages.  Personal 
computers  can  offer  secure  data  storage  and  predictable  response  times.  However, 
the  occasional  large  task  can  quickly  exceed  the  capacity  of  the  personal  machine. 

The  office  workstation  family  is  a  flexible  hybrid  of  the  above  two  approaches.  It 
resembles  the  personal  computer  in  that  it  has  a  fair  computational  capability  and 
storage  capacity  vvhenever  required.  Thus  the  problem  of  data  security  and  response 
time  are  solved.  Unlike  the  personal  computer,  the  workstation  has  a  moderate 
bandwidth  path  to  a  centralized  switching  facility  that  can  also  provide  convenient 
access  to  a  mainframe  computer.  File  servers  and  large  ‘'crunching"  tasks  can  be 
accomodated  using  this  hardware.  The  mainframe  can  provide  additional  support  on  a 
"as  required  basis".  This  hybrid  scheme  attempts  to  pool  the  real  advantages  of  both 
approaches  into  a  single  concept. 


Although  the  workstation  is  a  blend  of  the  mainframe  and  personal  computer,  this 
property  is  only  one  aspect  in  its  overall  design.  The  key  to  the  system  is  the  coordina¬ 
tion,  of  the  following  subs3'’stems  into  a  user  friendly  workstation:  i)Smart  Disk,  2)  Sys¬ 
tem  Bus,  3)  Input  Output  devices,  and  4)  External  Communications. 


When  all  the  individual  components  work  well  together,  there  is  a  synergestic 
effect.  In  other  words,  the  sum  of  the  tools  is  greater  than  the  components.  The  smart 
disk  has  electronics  to  provide  search  mechanisms  and  sustained  high  bandwidths  to 
the  other  components.  The  system  bus  connects  the  individual  components  with  wide 
datapaths  and  it  is  designed  to  accomodate  multiple  bus  masters.  This  feature  provides 
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the  necessary  hooks  to  add  multiple  processors  to  the  system.  The  display  also  has 
processing  capabilities  to  support  the  hit  mapping  operations.  Finally,  areas.?  to  a 
smart  ext.ernal  network  completes  the  high  performance  workstation  package. 


Internal  Workstation  Structure. 


6. 1.  A  Family  Approach 

A  wide  range  of  hardware  is  required  to  support  the  diverse  spectrum  of  office 
work.  Some  activites  will  lequire  only  a  modest  amount  of  hardweire  support;  others 
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■^vill  require  extensive  support  that  requires  a  substantial  amount  of  equipment. 

Our  proposal  is  to  adopt  a  common  set  of  operating  conventions  and,  a  standard¬ 
ized  internal  hardware  framework  that  supports  all  types  of  office  workstations.  The 
upwards  compatibility  offers  many:  advantages.  First,  flexibility  is  maintained  by  hav¬ 
ing  the  ability  to  upgrade  or  downgrade  equipment  as  the  corporation  evolves.  User 
training  is  also  minimized  since  the  user  needs  only  to  learn  the  new  features  of  the 
upgraded  workstation. 

The  flexibility  of  the  workstation  configuration  allows  a  company  to  install  the 
hardware  in  a  number  of  phases  thus  averaging  the  capital  expenditures  over  a  period 
of  lime.  The  family  of  workstations  range  from  a  simple  phone  and  video  terminal  to,  a 
high  performance  self  contained  processor  with  secondary  storage  and  a  graphics 
capability.  There  are  a  many  of  intermediate  configurations.  For  example,  a  proces¬ 
sor,  a  display  and  some  main  memory  together  with  three  or  four  small  I/O  devices  can 
bo  proficient  at  handling  a  host  of  different  tasks.  A  natural  upgrade  to  this  system 
would  be  the  addition  of  a  secondary  storage  device. 

Each  configuration  can  be  custom  tailored  to  fit  the  application  on  hand.  As 
requirements  change,  a  board  or  screen  can  be  changed  or  added  to  the  workstation. 
This  strategy  provides  a  genuine  degree  of  flexibility.  It  also  reduces  the  impact  of 
planned  obsolesence  since  new  components  can  be  incorporated  into  the  design  as 
they  become  available. 

6.2.  The  Completed  Package. 

Both  the  packaging  and  the  total  cost  of  the  workstation  are  important  considera¬ 
tions.  First,  let  us  consider  the  costs.  The  high  performance  workstation  configuration 
outlined  here  costs  in  the  neighbourhood  of  $25,000,  perhaps  a  little  more  or  a  little 
less,  in  any  case  this  is  a  considerable  expenditure  for  a  single  piece  of  equipment. 
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Conventlonal  desktop 


A  V^orkstatlon  Peolacement 
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One  is  well  justified  to  ask;  is  this  a  worthwhile  expenditure?  Unfortunately,  a  yes  or  no 
answer  will  not  suffice.  First  the  cost  of  a  workstation  is  based  on  the  cost,  of  com¬ 
ponents  that  are  produced  in  quantities  but,  are  not  produced  for  mass  consumption. 
Custom  circuits  and  specialized  techniques  come  into  play  when  mass  production  is 
considered.  These  techniques  can  do  much  to  reduce  the  cost  of  the  workstation. 
Next,  many  companies  amortize  the  cost  of  hardware  based  on  a  three  year  lifetime. 
This  of  course  is  based  on  the  fact  many  hardware  configurations  will  be  well  on  the 
road  to  obsolesence  within  this  period  of  time.  Intrinsic  to  the  office  workstation 
design  is  the  idea  of  future  expandability  by  providing  a  modular  design  amd  construc¬ 
tion  combined  with  wide  datapaths  that  can  accomodate  higher  performance  devices 
as  they  become  available.  In  this  sense,  the  high  initial  capital  cost  is  justified  on  the 
basis  of  a  longer  product  lifetime. 

Many  will  feel  that  the  high  performance  workstation  proposed  .here  is  still  ton 
costly.  In  these  circumstances,  it  would  be  wise  to  consider  one  of  the  more  modest 
workstations  having  more  limited  performance  and  capability.  Surprisingly  enough, 
many  functions  can  be  handled  by  a  $1,000  workstation  that  simply  consists  of  a  phone, 
a  screen  and  some  additional  communications  logic.  Since  there  is  a  strong  devotion 
to  compatibility  across  the  entire  family  of  workstations,  it  is  a  simple  matter  to 
upgrade  the  workstation  hardwai’e. 
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7.  Software 

The  discussion  to  this  point  has  centered  around  the  components  that  are  used  to 
construct  the  workstation.  A  few  words  about  the  software  is  now  in  order.  Although 
the  hardware  proposed  for  the  workstation  is  based  on  techniques  that  are  Tvell  esta¬ 
blished,  the  integration  of  these  components  leads  to  an  architecture  that  has  a 
different  "modus  operandi"  when  compared  to  traditional  computer  configuralions.  To 
function  effectively,  a  different  approach  is  required  by  the  software  that  supports  the 
system. 

Traditional  software  consists  of  programs  that  are  written  through  a  series  of  suc¬ 
cessive  refinements.  Typically,  a  task  is  broken  down  into  a  series  of  subtasks  which 
are  further  decomposed  until  they  can  be  broken  down  no  further.  At  this  point  indivi¬ 
dual  program  statements  specify  the  given  task.  Office  procedures  are  very  rarely 
organized  in  this  fashion.  Office  procedures  manipulate  objects  through  a  series  of  low 
level  manipulations.  They  involve  filling  out  forms,  writing  letters  and  filing  operations. 
There  exisits  a  need  for  a  compatible  language  that  provides  am  easy  means  to  carry 
out  these  tasks.  Offices  procedures  are  intrinsically  different  from  programming 
languages  in  one  other  respect:  office  procedures  involve  communication  from  place  to 
place.  Conventional  programming  languages  do  not  usually  accomodate  these  types  of 
commumcations  operations. 

A  set  of  languages  known  as  object  oriented  languages  are  well  suited  to  pro^^.de 
effective  support  for  office  procedures.  Object  oriented  languages  are  based  on  enti¬ 
ties  called  objects  and  messengers,  which  are  used  to  specify  a  commumcations  path 
between  two  objects  and  the  manipulation  for  the  objects  in  question. 

Low  level  manipulations  simply  consist  of  a  message  passed  between  two  objects. 
A  typical  example  might  involve  filing  a  document  in  a  specified  file.  In  this  example, 
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the  document  and  the  specified  file  are  both  objects:  the  filing  operation  represents 
the  messenger.  Complex  operations  are  easily  constructed  using  a  series  of  low  level 
manipulation.s.  One  nice  feature  that  these  languages  offer  is  a  statement  by  state¬ 
ment  interactive  capability.  Thus  spontaneous  tasks  may  be  specified  one  statement 
at  a  time.  Vfben  a  pailicuiar  procedure  has  been  established,  a  series  of  statements 
that  specify  the  procedure  can  be  filed  for  future  use. 

The  library  can  also  contain  a  common  set  of  complex  functions  that  are  often 
called  "experts”.  An  expert  consists  of  a  series  of  object  oriented  statements  that  are 
designed  to  handle  common  but,  complex  operations  such  as  accounting  fox’  instance. 
As  the  procedures  in  the  office  evnlve,  the  experts  can  be  changed  to  reflect  the  new 
operating  conventions. 

The  objects  in  an  object  oriented  system  represent  entities  that  can  be  either  con¬ 
crete  or  abstract  in  nature.  Messengers  are  also  considered  to  be  objects.  This  feature 
allows  the  system  to  dynamically  create  new  messengers  while  statements  are  exe¬ 
cuted.  The  act  of  sending  an  object  to  a  receiver  provides  a  natural  means  to  model 
physical  communications  such  as  a  mailing  operation.  Object  oriented  languages  can 
be  used  to  support  a  number  of  simultaneous  operations  by  simply  supporting  a 
number  of  active  messengers.  Thus  it  is  easy  to  support  work  at  several  desktops 
simultaneously  or,  even  support  simultaneous;  activites  at  one  desktop. 

8.  Conclusions 

This  paper  has  formulated  a  plan  for  the  design  of  an  office  workstation.  The  sub¬ 
systems  that  constitute  the  workstatiori  are  carefully  chosen  to  maximize  usefulness, 
versatility  and  reliability.  These  components-  are  designed  to  be  compatible  with  one 
another  and  offer  a  high  degree  of  performance  by  maintaining  vdde  data  paths  and 
smart  peripheral  devices. 
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The  office  workstation  should  be  a  flexible  entity.  It  should  be  easily  moved  and 
reconfigured  as  requirements  dictate.  Planned  obsolesenee  is  not  planned  for  the 
workstation.  It  is  designed  to  evolve  with  the  changing  technologies.  To  meet  this  goal 
a  family  of  workstations  is  proposed  that  shares  a  common  set  of  core  components  and 
cabinetry. 

Throughout  the  design  of  the  workstation  several  undercurrents  of  thought  have 
molded  the  overtdl  plan.  One  of  these  was  a  strong  desire  to  adapt  the  machine  to  man 
rather  than  have  man  adapt  to  the  machine.  It  is  hoped  that  the  multi-media  support 
the  harware  provides  will  provide  a  means  to  accomplish  this  goal.  It  is  felt  that  a  suc¬ 
cessful  system  design  needs  to  consider  three  diflerent  perspectives:  harware,  software 
and  human  in  order  to  meet  the  needs  of  the  electonic  office. 


-184- 


References 


ELLI79  Clarence  A.  Ellis,  Gary  J.  Nutt;  Compiiter  Sdence  and  Office  Infcnrrnatian  Sys¬ 
tems;  Xerox  Palo  Alto  Research  Center.  June  1979. 

G00D80  David  Goodstein;  Output  Alternatives;  Datamation.  Feb  80  ppl  3 1, 130. 

HSIA80  D.  K.  Ksiao;  Database  Manager  nerd  Hard'ware  -The  Arrival  of  Database  Com- 

nuters;  The  seventv-fouri.h  Infotech  State  of  the  Art  ConfereTiee,  Lon- 
don,  October  1980, 

M0RG76  K.Morgan;  Office  A^utoTnaiion  Project  -  A  Research  Perspective;  AFiPS  Confer¬ 
ence  Procecedings  of  the  NCC,  New  York,  vol.  45,  1976,  pp.  505-610. 

ZISM77  M.  Zisman:  Representation,  Specification  and  Automation  of  Office  Pro¬ 
cedures;  Ph.D  Thesis,  Wharton  School,  University  of  Pennsylvania, 


1977. 


DATA  BASE  versus  MESSAGE  SYSTEMS 


D.  Tsichritzis 

Computer  Systems  Research  Group 
University  of  Toronto 
Toronto,  MSS  lAl,  Canada 


1.  Introduction 

The  raison  d’etre  of  data  base  management  systems  is  to  record  and  inter¬ 
pret  information.  Implicit  in  that  operation  is  the  notion  of  communicating 
information.  For  instance,  by  recording  some  data  we  cam  communicate  the 
information  they  represent  to  somebody  else  havirig  access  to  the  data.  Even  in 
the  case  of  a  personal  data  base  we  can  claim  that  the  data  base  communicates 
information  in  the  future.  That  is,  the  same  person  who  recorded  the  informa¬ 
tion  can  inspect  it  at  a  iater  time.  Thus,  communication  is  a  basic,  although 
implicit,  aspect  of  a  data  base  management  system.  Communication,  however, 
is  usually  associated  with  message  sending  systems.  It  is  important  to  investi¬ 
gate  the  ways  that  communication  through  data  bases  is  similar  to,  or  different 
from,  communication  via  messages. 

The  two  concepts  evolved  historically  in  different  ways.  We  can  hypothesize 
that  the  first  messages  were  probably  verbal.  Different  media  were  later  used  to 
record  and  ensure  the  message’s  accuracy.  In  this  way  messages  evolved  into 
letters,  telegrams,  telex  and  finally  electronic  mail.  The  origins  of  data  bases 
were  probably  in  bookkeeping.  For  instance,  clay  tablets  used  to  record  stock 
items  in  a  storeroom  can  be  considered  as  a  first  data  base.  From  these  humble 
origins  evolved  book  entries,  files  and  finally  data  bases.  The  two  concepts  were 
initiated  independently  and  they  arc  still  considered  different.  In  terms  of 


-185- 


-186- 


technology,  message  systems  are  based  on  networks,  while  data  base  systems 
are  based  on  computers.  The  persons  involved  in  research  and  development  in 
the  t'vvo  areas  usually  come  from  different  backgrounds,  i.c.,  communication  and 
computer  science.  However,  recently  microprocessor  technology,  i.e.,  comput¬ 
ers  has  made  inroads  into  networks.  On  the  other  hand,  distributed  data  bases 
imply  the  existence  of  networ  ks  and  their  close  cooper  ation  with  data  base  sys¬ 
tems.  It  is  becoming  increasingly  difficult  to  seperate  computers  and  networks. 
Hence,  we  should  examine  very  carefully  what  separates  message  sending  sys¬ 
tems  and  distributed  data  base  management  systems. 

A  common  definition  of  a  distributed  data  base  is  that  it  provides  the  image 
of  a  centralized  data  base  (complete  global  integration)  in  a  geographically  dis¬ 
tributed  manner.  This  viewpoint  may  be  self  defeating.  If  w'e  want  absolute 
integration  maybe  vre  should  continue  Vvith  central  data  bases.  Many  distributed 
data  bases  imply  a  distribution  of  control  and  data  according  to  user  require¬ 
ments  and  applications  semantics.  The  distribution  is  not  solely  the  result  of 
system  perTormance  r  equirements.  Distributed  data  bases  should  have  a  notion 
of  personal  data  and  should  allow  operations  which  can  be  used  to  send  informa¬ 
tion  to  specific  nodes  of  the  data  base.  Message  systems,  as  they  are  commonly 
defined,  do  not  provide  a  facility  for  global  common  information.  As  such,  they 
cannot  be  used  to  integrate  information  accross  the  system.  It  is  important  to 
extend  message  systems  in  such  a  way  that  they  can  deal  easily  with  global 
infomicxt  hOn. 

Before  we  proceed,  we  should  define  the  basic  terms  which  will  be  used 
throughout  this  paper.  We  will  denote  as  a  record  the  basic  passive  unit  of  data 
foi  the  purpose  of  data  base  operations.  A  transaction  is  the  active  unit  of 
operation  on  one  or  more  records.  A  message  is  the  basic  passive  unit  of  a  mes¬ 
sage  sending  system.  Finally,  a  communication  is  the  basic  active  unit 
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representing  the  operation  of  sending  and  receiving  a  message.  In  the  next  sec¬ 
tion  we  will  discuss  the  apparent  difTerences  between  records  and  messages,  and 
between  transactions  and  communications. 

2.  Differences 

The  information  present  in  both  message  and  records  can  potentially  affect 
a  group  of  recipients.  This  target  of  the  communication  presents  the  first 
difference  between  data  base  and  message  systems.  Messages  usually  imply  an 
indiyidual,  or  at  least  well  defined,  target  for  the  information  present  in  the  mes¬ 
sage.  If  the  target  for  the  message  is  not  an  individual,  then  the  set  of  reci¬ 
pients  is  declared  as  a  group,  In  the  extreme  case,  we  have  a  complete  broad¬ 
cast  operation.  In  the  data  base  case,  the  effect  of  a  transaction  is  supposed  to 
be  felt  by  everybody  who  has  access  to  the  same  data  that  the  transaction  has 
changed.  Introducing  personal  information  for  global  view  in  a  data  base  does 
not  malce  much  sense.  The  target  of  the  information  communicated  by  the  tran¬ 
saction  is  potentially  the  general  public.  The  target  can  be  limited  by  securi  ty 
provisions  or  data  base  views.  However,  the  default  is  a  broadcast  operation.  We 
have,  therefore,  a  difference  of  communications  and  Lr  arisaclions  with  respect 
to  targets,  at  least  in  the  usual  modes  of  operation.  This  situation,  however,  is 
rapidly  changing.  Electronic  mail  makes  message  broadcasting  possible  and 
realistic.  On  the  other  hand,  data  base  views  make  individual  targetting  of  data 
base  transactions  plausible. 

A  related  difference  between  communications  and  transactions  is  with 
respect  to  notification.  A  message  is  supposed  to  be  delivered  to  the  recipient. 
That  is,  the  recipient  is  alerted  of  the  presence  of  the  message.  The  effect  of  a 
transaction,  on  the  other  hand,  is  simply  posted  without  any  notifications.  The 
potential  recipient  of  the  information  communicated  by  the  transaction  is  not 
alerted.  The  recipient  has  to  issue  a  sepeirate  operation  to  receive  the 
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information.  This  difference  is  related  to  the  individual  teirgetting  of  the  com¬ 
munication.  In  messages  the  communication  is  targetted,  hence  we  know  whom 
to  notify.  In  transactions  the  effect  of  the  transaction  is  global.  The  interested 
parties  are  supposed  to  issue  queries  to  see  the  transaction’s  record.  This 
difference  is  slowly  disappearing.  In  many  message  systems  persons  set  up 
mailboxes  to  receive  their  messages.  The  persons  pick  the  messages  they  want 
on  their  own  initiative.  In  this  way;  they  avoid  being  bombarded  mth  junk  mail. 
In  the  case  of  transactions  we  could  use  side  effects  which  provide  alerts  to 
interested  parties.  For  instance,  the  reservation  of  the  last  seat  iii  a  flight  cctn 
alert  the  travel  agents  that  the  flight  is  not  available. 

Another  obvious  separation  deals  with  recording  of  the  information  being 
communicated.  The  emphasis  in  messages  is  supposed  to  be  communication. 
Recording  of  the  message  is  supposed  to  be  of  secondary  importance.  In  terms 
of  treinsactions  records  are  supposed  to  have  lasting  importance.  No  transac¬ 
tion  makes  sense  in  the  absence  of  related  records.  This  separation  is  again 
artificial.  On  clo.se  observation,  we  find  that  most  official  messages  are 
recorded.  They  are  recorded,  in  fact,  twice  by  both  the  sender  and  the 
receiver(s).  A  record  can  be  as  temporary  as  the  recording  of  a  message.  Alter¬ 
natively  the  recording  of  a  message  can  be  as  permanent  as  a  record.  The 
recording  has  nothing  to  do  with  the  mode  of  communication  but  with  the 
importance  of  the  message. 

The  formality  of  the  commum cation  brings  up  another  artificial  difference. 
There  is  a  feeling  that  messages  are  supposed  to  be  informal.  Transactions  are 
supposed  to  be  formal.  On  close  inspection  there  is  nothing  informal  about  a 
telex  message  or  business  letter.  It  carries  as  much  weight  as  a  record.  On  the 
other  hand,  a  transaction  in  some  of  the  computer  games  played  via  data  bases 
is  very  frivolous.  The  real  difference  is  in  stru.ct'ure.  Messages  have  evolved  free 
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of  format.  In. fact,  a  message  is  usually  defined  as  having  a  header  and  a  body 
[Kirstein  1980].  No  restriction  is  placed  on  the  format  of  the  contents.  Data 
base  records  on  the  other  hand  have  a  very  strict  format.  The  absence  of  for¬ 
matting  requirements  make  messages  simpler  to  use  for  informal  communica¬ 
tion.  As  a  result,  messages  have  an  image  of  personal  communication  while 
records  have  a  rather  sterile  image  of  data  processing.  The  difference  in  terms 
of  structure  is  slowly  disappe airing.  Data  base  research  perceives  the  need  for 
handling  text,  voice,  pictures,  etc.  as  an  integrcd  pai't  of  a  data  base.  On  the 
other  hand,  message  sending  systems  cannot  continue  forever  to  capture  struc¬ 
ture  in  the  headers.  They  need  to  exploit  the  structure  which  is  present  in  the 
contents  of  the  messages. 

Another  difference  between  messages  and  data  base  records  deals  with 
interpretation.  By  interpretation  we  mean  general  rules  for  interpreting  the 
contents  of  a  message  or  record.  Messages  carry  their  own  interpretation. 
There  is  no  separation  in  a  message  between  data  and  interpretation.  In  the 
Coise  of  data  base  records  interpretation  is  abstracted  and  forms  a  separate  part 
of  the  data  base,  i.e.,  the  schema.  A  transaction  has  to  abide  by  the  stated 
interprelation  in  the  schema.  This  situation  imphes  that  transactions  can  only 
be  issued  with  respect  to  a  schema.  Messages  can  be  sent  without  any  notion  of 
a  schema.  There  is  only  the  assumption  of  some  universal  interpreting  rules, 
e.g.,  natural  language.  This  difference  is  slowly  disappearing.  In  knowledge 
based  systems,  metadata  (schema)  and  data  are  mixed.  A  transaction  is  issued 
with  respect  to  the  knowledge  base  without  a  notion  of  a  schema.  Wc  sec,  there¬ 
fore,  the  interpretation  becoming  part  of  the  data  base.  On  the  other  hand,  in 
form  systems  messages  have  a  separate  interpretation  in  terms  of  their  tem¬ 
plates  [Tsichritzis  1980].  Electronic  message.s  in  terms  of  forms  have  a  declared 
type.  The  form’s  type  provides  a  general  interpretation  which  does  not  neces- 
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sarily  accompany  every  message.  The  form’s  type  can  be  sent  separately  from 
the  values  of  the  form’s  attributes. 

A  very  important  difiference  between  messages  and  data  base  records  is 
ownership.  By  ownership  we  mean  whether  a  particular  unit,  i.e.,  person,  pro¬ 
gram,  or-  station  has  complete  control  over  a  message  or  record.  In  message 
systems  a  message  is  initially  owned  by  the  sender.  The  ownership  is 
transferred  eventually  to  the  receiver.  The  case  of  a  record  is  very  diSerent. 
The  record  is  supposed  to  be  owned  by  a  third  party  (the  data  base).  A  ti  ansae- 
tion  does  not  imply  a  change  of  ownership.  This  difference  is  a  direct  result  of 
the  global  versus  personal  view  of  the  information.  Messages  are  considered 
indmdual  property.  Records  are  considered  common,  i.e.,  kept  by  the  system 
for  the  general  public.  This  distinction  is  only  in  degree  rather  than  substance. 
In  many  electronic  mail  systems  the  message  is  temporarily  owned  by  the  sys¬ 
tem.  One  can  make  the  claim  that  this  system  owned  message  is  the  real  mes¬ 
sage.  The  sender  just  constructs  an  image  and  the  receiver  obtains  an  image  of 
the  message.  The  real  message  is  owned  by  a  third  party,  i.e.,  the  message  sys¬ 
tem.  Tn  the  same  way  a  transaction  goes  through  three  stages  in  terms  of  com¬ 
munication.  A  user  initiates  a  transaction,  which  implies  that  he  owns  tem¬ 
porarily  an  image  of  the  data  base  record{s).  In  fact,  locking  ensures  this  type 
of  temporar)''  ownership.  The  record  is  then  released  and  reverts  to  the  system. 
Finally,  another  user  obtains  temporary  ownership  of  the  data  base  records  in 
order  to  observe  the  effects  of  the  transaction.  In  this  framework  message  sys¬ 
tems  and  data  base  systems  are  not  very  different. 

Fin  vUy.  there  is  a  misleading  difference  with  re.spect  to  implementation. 
That  is.  tl  ie  per  vailing  Leohnology  used  in  the  implementation  of  the  systems  is 
differenl.  Message  systems  imply  networks,  while  transaction  systems  imply 
computers.  This  difference  is  only  historical  and  it  currently  has  no  meaning. 
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Message  systems  are  very  important  within  the  same  computer  system.  In  fact, 
many  electronic  mail  facilities  arc  implemented  within  a  central  system.  At  the 
same  time  transactions  in  distributed  data  bases  imply  a  number  of  messages 
being  passed  around  in  a  network.  The  picture  is  additionally  blurred  with  the 
emergence  of  local  networks  and  servers.  In  such  an  environment  it  is  difficailt 
to  define  what  is  local  in  a  computer  system.  Peripherals  and  other  computers 
are  immediately  accessible  through  the  local  network. 

We  hope  that  we  have  persuaded  the  reader  that  there  is  no  substantial 
conceptual  difference  between  message  systems  and  transaction  oriented  data 
base  systems.  Whatever  differences  exist  are  historical  or  they  imply  different 
emphasis.  In  office  information  systems  there  is  a  real  effort  tvovmrds  integra¬ 
tion.  This  implies  an  integration  of  message  and  data  base  systems. 

y.  Common  framework 

in  order  to  deal  with  data  bases  and  message  systems  in  a  uniform  manner 
we  need  a  common  framework.  In  this  section  we  will  attempt  to  construct  such 
a  freimework.  Let  us  define  the  cummulative  information  container  of  our  sys¬ 
tem  as  a  communication  base.  A  communication  base  is  a  medium  where  users 
can  add  and  obtain  information.  The  main  purpose  of  the  communication  bass 
is  to  provide  a  means  for  users  to  communicate  with  each  other.  In  addition, 
the  communication  base  can  retain  some  nr  all  of  the  information  for  further 
use,  i.e.,  communication  at  a  future  time.  The  basic  unit  of  the  communication 
base  will  be  referred  to  as  a  datagram.  A  datagram  is  the  basic  unit  of  adding, 
recording  and  obtaining  information  from  the  communication  base. 

A  datagram,  consists  of  a  system-wide,  unique  identifier  and  some  contents. 
It  can  be  optionally  typed.  However,  even  bit  strings  or  byte  strings  can  be  con¬ 
tents  of  a  datagram  without  any  added  interpretation.  A  number  of  data 
categories  can  be  defined  in  such  an  environment.  A  datagram  can  be  declared 
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to  be  a  type  within  a  certain  category.  In  such  a  case  the  datagram  will  inherit 
the  structure,  constraints  and  operations  of  that  category.  For  example,  a  busi¬ 
ness  paper  form  can  be  thought  of  as  a  datagram  of  a  particular  type.  Notice 
that  we  make  a  distinction  between  the  datagram’s  unique  identifier  and  its  con¬ 
tents.  This  is  very  important  since  the  contents  are  subject  to  change,  including 
change  of  format.  There  is  no  content  vaiue(s)  which  are  stable  enough  to 
uniquely  identify  the  datagram.  Two  different  datagrams  may  have  the  same 
contents.  The  identifier  of  the  datagram  encapsulates  an  instance  of  a 
datagram. 

There  are  tw'^o  broad  categories  of  operations.  The  first  deals  with  adding 
datagrams  to  the  communication  base.  The  second  deals  with  obtaining 
datagrams  from  the  coxnmuiiicaLion  base.  In  both  cases  we  need  a  notion  of 
address.  An  address  is  a  unique  identifier  associated  with  an  agent  adding  or 
obtaining  datagrams  in  our  environment.  An  address  can  be  associated  with  a 
station,  a  person,  a  program  or  any  other  unit  which  is  capable  of  adding  and 
obtaining  datagrams.  The  system  knows  the  set  of  global  addresses.  It  does  not 
matter  whether  addresses  are  evaluated  centrally  or  in  a  distributed  fashion  as 
long  as  they  are  evaluated  in  a  deterministic  way.  Every  operation  on  the  com¬ 
munication  base  is  issued  by  an  address  and  it  potentially  affects  other 
addresses. 

As.sociated  with  the  operation  of  adding  datagrams  to  the  communication 
base  is  a  notion  of  scope.  A  scope  is  evaluated  into  a  set  of  addresses  which  are 
affected  by  nn  added  datagram,  i.e.,  the  potential  recipients  plus  the  sender 
address.  The  scope  of  a  datagram  can  be  an  individual  address,  a  group  of 
addresses  or  all  addi'esses  in  the  system.  Associated  with  the  scope  is  the 
optional  requirement  of  an  alert.  An  alert  is  a  notification  of  the  presence  of  a 
dataigram  ■which  the  recipient  addresses  participating  in  the  scope  are  forced  to 
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receive.  They  may  or  may  not  receive  the  datagram  as  they  choose.  They  are 
guaranteed,  however,  to  have  received  the  alert  announcing  its  presence.  An 
alert  can  be  thought  to  contain  at  least  the  datagram’s  unique  identifier  plus  the 
address  of  its  origin. 

A  scope  can  be  specified  as  a  set  of  addresses.  In  the  set  we  may  include 
station  addresses  meaning  that  the  datagram  will  be  posted  in  these  stations. 
Notice  that  a  station  does  not  have  to  coincide  with  a  computer,  it  should  be 
thought  of  as  a  logical  workstation  which  is  used  as  a  bulletin  board  to  post  the 
datagram.  A  scope  may  include  person  identification  addresses.  In  this  case 
the  person  wLli  get  an  alert  of  the  presence  of  the  datagram  no  matter  on  what 
station  he  logs  in.  In  two  extreme  cases  a  datagram  may  have  a  private,  or  a 
public  scope.  In  the  private  case  its  scope  is  only  the  originating  address. 
Nobody  else  can  access  the  datagram.  In  the  public  case  the  datagram  is  posted 
for  view  by  every  potential  user  or  user’s  agent. 

An  interesting  approach  for  the  specification  of  a  scope  is  through  a  pro¬ 
cedure.  Such  a  procedure  is  used  to  evaluate  the  addresses  in  the  scope.  The 
evaluation  may  be  at  the  point  of  origin,  centrally,  or  distributed.  In  the  case  of 
distributed  evaluation  the  dateigram  plus  the  scope  defining  procedure  are 
routed  through  the  system.  Routing  processes  are  defined  in  the  system  which 
interpret  the  scope  procedure  on  the  basis  of  the  datagram’s  contents  and  for¬ 
ward  accordingly  the  datagram.  Using  a  procedural  definition  of  scope  we  can 
have, Very  flexible  addressing  environment.  For  example,  addresses  in  a  scope 
need  not  be  predefined,  but  can  depend  on  the  contents  of  the  datagram  and 
the  .state  of  the  system. 

Each  datagram  added  to  the  system  can  be  recorded  in  three  different 
ways.  In  the  first  case  the  datagram  is  recorded  in  each  address  of  the  scope 
including  the  address  of  the  sender.  Each  copy  is  from  then  on  independent  and 
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it  is  treated  e.s  a  separate  datagram.  In  the  second  case  one  logically  common 
recording  is  supposed  to  exist  for  everybody  in  the  scope.  This  does  not  imply  a 
unique  copy.  There  may  be  multiple  copies  of  the  datagram  ’■A'hich  the  communi¬ 
cations  base  will  guarantee  to  be  always  consistent.  Finally,  in  the  third  case 
the  datagram  is  recorded  only  until  it  is  received  by  the  recipients  and  it  then 
disappears.  If  the  recipients  want  a  copy  they  need  to  explicitly  copy  it  after 
they  receive  it. 

Another  form  of  adding  information  is  by  modifying  an  existing  datagram- 
The  datagram  is  either  private  or  common  to  a  scope.  The  modification  of  a 
priv'ate  datagram  simply  changes  the  datagram’s  body  as  viewed  by  that  partic¬ 
ular  address.  It  does  not  convey  to  anybody  else  any  information.  The 
modification  of  a  common  datagram  potentially  transmits  information  to  all  the 
addresses  in  the  scope.  For  instance,  the  answer  to  a  messeige  can  be  recorded 
as  a  modification  to  the  datagram  carrying  the  message.  In  another  example, 
communicating  general  information  can  be  effected  by  modifying  a  common 
datagram.  After  each  mo difi cation  an  alert  is  issued  to  all  addresses  in  the 
scope,  for  which  such  an  alert  has  been  defined. 

Obtaining  datagrams  from  the  communication  base  can  be  effected  in  two 
difTerenl  ways.  In  the  first  case  an  alert  has  been  issued  to  that  address  regard¬ 
ing  the  existence  of  a  datagram.  The  user  can  optionally  obtain  the  datagram 
and  record  it  if  he  is  interested.  There  is  no  need  for  a  selection.  In  the  second 
case  the  user  is  actively  seeking  datagrams  which  are  available  to  him,  i.e.,  his 
address  is  part  of  their  scope.  In  that  mode  the  user  queries  the  ccmmunica- 
tions  base  for  the  existence  of  datagrams  from  particular  addresses,  of  a  partic¬ 
ular  type,  or  of  particular  contents.  This  operation  involves  selection  of 
datagrams  and  presupposes  a  general  query  facility  of  both  formatted  and  non- 
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A  view  can  be  defined  for  obtaining  datagrams.  A  view  is  the  dual  concept 
of  a  scope.  The  scope  relates  to  the  addresses  that  arc  affected  by  a  datagram. 
A  view  relates  to  the  addresses  from  which  this  address  cares  to  receive 
dataigrams.  A  view  can  also  specify  types  of  datagrams  and  even  contents.  In 
this  way  it  can  serve  as  a  filter  for  the  datagrams  affecting  an  address. 

A  very  important  case  of  a  view  is  an  active  view.  In  data  base  systems 
views  are  passive,  defining  ways  to  look  at  data  when  a  query  is  initiated.  In  our 
environment  view's  can  be  defined  which  actively  collect  datagram??.  In  this  way 
they  act  as  automatic  procedures  [Hogg  et  al  1981].  They  are  constantly  waiting 
for  prespecified  datagrams  to  enter  the  communication  base.  When  they  assem¬ 
ble  the  desired  datagrams  they  may  trigger  some  actions  on  them  or  alert  a 
user  of  their  presence. 

Types  of  datagrams  can  be  defined.  Each  type  specifies  a  structure  for  the 
datcigram  contents  including  attribute  names  when  appropriate.  The  type  may 
also  involve  integrity  constraints  which  can  be  local  to  each  datagram,  local  to 
each  address  originating  datagrams  of  that  type,  or  global  for  that  type. 
Integrity  constraints  can  also  be  specified  which  are  global  for  datagrams  of 
many  types  in  a  scope,  or  view,  or  global  for  the  whole  communication  base.  It 
should  be  obidous  that  global  constraints  are  much  harder  to  enforce.  On  the 
other  hand  local  constraints  may  allow  inconsistencies  between  datagrams 
present  in  the  communication  base. 

We  claim  that  the  outlined  framework  can  provide  a  facility  for  both  mes¬ 
sages  and  transactions  as  we  currently  understand  them.  The  adding  and 
receiving  of  non-formatted  datagrams  resemble  the  case  of  a  message  system. 
The  adding,  modifying  and  selecting  of  common,  strictly  typed  datagrams 
resemble  transaction  oriented  data  base  systems.  What  is  interesting  is  that 
such  a  framework  points  the  way  to  many  other  possibilities.  Such  possibilities 
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have  been  suggested  as  new  facilities  of  either  data  base  or  message  sending 
systems.  For  example,  message  sending  systems  try  to  take  advantage  of  what¬ 
ever  structure  exists  in  the  messages.  On  the  other  hand,  data  base  manage¬ 
ment  systems  try  to  handle  text  and  other  non-formatted  data. 

4.  Research  directions 

The  framework  outlined  in  the  previous  section  raises  many  questions  which 
point  to  several  research  directions.  There  are  a  number  of  issues  which  need 
careful  investigation.  The  first  very  basic  question  deals  with  the  notion  of  an 
address.  An  address  in  a  simple  message  system  is  geographical..  Transaction 
systems  have  no  notion  of  address.  They  incorporate,  however,  the  notion  of  a 
user  identifier.  An  address  in  our  environment  should  combine  station,  geo¬ 
graphical  address,  person  and  role  that  the  person  is  playing.  In  addition,  it 
should  permit  a  process  to  be  addressed  which  serves  as  an  agent  for  a  particu¬ 
lar  person.  It  is  important  to  have  a  flexible  facility  where  new  addresses  can  be 
defined  and  addresses  can  be  modified.  Notice  that  in  distributed  data  base 
management  systems  the  notion  of  address  is  being  bypassed  by  assuming  a  glo¬ 
bal  data  base  that  can  be  accessed  and  manipulated  from  any  point.  We  feel 
that  this  is  not  adequate.  There  is  a  need  for  logical  addresses,  separate  from 
physical  node  addresses,  in  a  general  communication  base  system.  The  investi¬ 
gation  of  flexible  and  powerful  addressing  schemes  is  an  important  research 
direction. 

A  second  basic  issue  relates  to  scopes.  In  its  most  elementary  form  a  scope 
is  specified  as  a  list  of  addresses.  There  arc  many  questions  regarding  the 
structure  of  this  list.  Another  problem  concerns  the  definition  and  evaluation  of 
a  scope.  Is  a  scope  globally  defined?  Is  it  evaluated  locally  at  the  origin 
address?  Is  the  definition  and  evaluation  of  the  scope  distributed  in  the  system? 
Scopes  can  be  defined  by  procedures  which  evaluate  the  list  of  addresses  on  the 
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basis  of  the  datagram’s  contents  and  other  information  available  in  the  system. 
Scopes  can  also  have  an  active  component.  In  addition  to  providing  a  list  of  tar¬ 
get  addresses  they  may  define  a  transformation  of  the  datagram.  In  this  way 
the  datagram  will  have  different  structure  or  contents  for  each  destination 
address.  Such  powerful  scopes  may  be  used  to  customize  datagrams  depending 
on  their  recipients,  or  process  them  before  they  are  sent  away. 

A  third  direction  of  research  deals  with  views.  A  data  base  view  behaves 
mainly  as  a  filter.  It  allows  the  user  to  see  only  parts  of  the  data  base  in  a 
specific  way.  A  view  in  our  environment  may  actually  modify  extensively  a 
datagram.  For  example  a  view  can  get  rid  of.  junk  mail,  trim  down  the  received 
datagrams,  or  even  get  some  cummulative  sLalisties  about  the  received 
datagrams.  In  addition,  a  view  can  be  defined  as  an  active  participant  being  on 
the  lookout  for  some  datagrams,  filing  datagrams  when  they  arrive  and  perform¬ 
ing  automatically  basic  tasks.  Both  scopes  and  views  can  be  thought  of  as  cases 
of  automatic  procedures.  However,  in  both  cases  there  is  a  need  for  research  to 
define  exactly  their  facilities. 

A  fourth  direction  of  research  involves  structure,  or  the  absence  of  it. 
Datagrams  may  have  different  types.  Such  t5rpes  should  allow  text,  voiccgrams, 
pictures  or  video  messages-  In  addition,  the  structure  may  be  different  in 
different  addresses  for  the  same  datagram.  That  is,  the  structure:  may  be  part 
of  the  definition  of  the  scope  or  view  as  opposed  to  the  definition  of  the 
datagram.  This  facility  will  allow  a  datagram  to  be  adopted  to  the  facilities  avail¬ 
able  in  local  stations  where  it  is  received.  The  issue  is  not  only  the  ability  to 
handle  non-formatted  data.  We  should  also  aUov/  transformations  of  the  format. 
This  may  imply  a  unique  internal  representation  of  the  datagrams,  e.g.,  bit  or 
byte  strings.  We  need  research  to  learn  to  cope  effectively  with  structure  and 
use  it  properly  only  when  we  can  take  advantage  of  it. 
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The  fifth  research  direction  deals  with  integrity  constraints.  Integrity  con¬ 
straints  in  data  bases  are  supposed  to  have  universal  truth.  In  message  systems 
they  are  not  emphasized.  There  is  a  need  for  integrity  constraints  to  be 
specified  and  evaluated  within  a  certain  context.  For  instance,  they  can  apply 
within  an  address,  a  scope,  a  type  or  a  view.  Notice  the  difference  between  a 
constraint  on  a  scope  and  a  constraint  on  a  view.  In  the  first  case  two  incon¬ 
sistent  datagrams  cannot  be  added  in  the  communication  base,  at  least  within 
that  scope.  In  the  second  case  received  datagrams  may  be  inconsistent  but  the 
recipient  refuses  to  see  them  ■within  the  same  view.  In  the  first  case  we  avoid 
introducing  inconsistent  information.  In  the  second  case  we  refuse  to  be  fed 
inconsistent  information.  Research  is  needed  to  investigate  specification  and 
enforcement  of  such  a  flexible  integrity  constraint  environment. 

A  sixth  research  direction  deals  with  side  effects.  In  data  base  systems  side 
effects  are  considered  an  unwanted,  undesirable  feature  (anomalies).  In  com¬ 
munication  base  systems  side  effects  are  very  importBint.  We  have  already 
hinted  of  the  presence  of  side  effects  in  the  definition  of  scopes  and  vie'ws. 
Alerts  are  also  side  effects  which  should  be  supported  by  the  system.  The 
specification  of  side  effects  is  not  an  easy  matter.  Their  automatic  triggering  is 
also  hard.  Finally,  the  evaluation  of  their  influence  on  the  correct  operation  of 
the  system  is  extremely  difficult.  Research  is  needed  in  aii  these  areas.  The 
emphasis  of  the  research  should  be  on  the  specification  and  application  of  side 
effects  and  not  their  avoidance. 

A  seventh  research  direction  deals  with  uniform  interfaces.  We  need  a  sim¬ 
ple,  easy  to  use  interface  which  can  be  used  to  send  messages  and  specify  tran¬ 
sactions.  The  construction  of  a  new  message  and  the  entry  of  a  new  record 
should  be  specified  as  a  datagram  specification.  The  receipt  of  a  message  and  a 
query  transaction  should  be  specified  as  an  operation  of  obtaining  a  datagram. 
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Finally,  the  change  of  a  message  and  an  update  transaction  should  be  specified 
as  a  datagram  modification.  In  addition,  the  user  interface  should  allow 
different  optional  structures  and  communication  media.  There  is  also  a  need  for 
a  flexible  user  interface  for  the  definition  of  scopes  and  views  in  terms  of  side 
effects  and  integrity  constraints.  Obtaining  a  nice  user  interface  for  such  a 
powerful  environment  will  be  difficult  and  needs  much  further  research. 

Finally,  a  last,  all  encompassing  research  direction  is  architecture.  A  com- 
niunicalion  base  system,  as  we  have  described  it,  needs  to  be  built  out  of  per¬ 
sonal  computers,  servers,  mainframes,  local  networks  and  global  networks. 
There  are  many  important  pragmatic  problems  relating  to  multiple  copies,  con¬ 
sistency,  concurrency  control,  backup  and  recovery,  etc.  The  architecture  for 
supporting  such  a  powerful  facility  ^vill  be  complex.  The  implementation  prob¬ 
lems  will  be  with  us  for  a  long  time. 

5.  Concluding  remarks 

Data  base  management  has  moved  into  distributed  data  bases  while  tr3nng 
to  retain  a  global  view  of  the  data  base.  This  requirement  has  introduced  a  vast 
collection  of  problems  related  to  the  architecture  of  distributed  data  base  s^'-s- 
tems.  While  dealing  with  these  problems  data  base  researchers  ignored  other 
basic  approaches  and  requirements.  There  is  a  need  to  comprehend  the  overall 
picture  of  information  exchange.  Data  base  management  is  only  one  approach 
for  communicating  information.  If  we  persist  on  the  same  approach  we  run  the 
risk  of  getting  bogged  down  by  difficult  and  sometimes  unrealistic  problems.  If 
we  accept  other  communication  approaches  we  are  led  to  many  new,  interesting 
techniques.  For  example,  message  systems  are  following  a  very  different 
approach.  Messages  are  exchanged  between  addresses  allowing  great  flexibility 
on  their  routing  and  the  structure  of  their  contents.  Distributed  data  base 
management  should  be  heavily  influenced  by  the  problems  and  techniques  of 
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mcssagc  systems.  On  the  other  hand,  many  messages  are  also  recorded  for 
further  processing.  To  achieve  this  goal  message  systems  have  to  borrow  many 
ideas  and  techniques  from  data  base  management. 

In  this  paper  we  propose  a  framework  for  viewing  both  message  sending  sys¬ 
tems  and  data  base  management  systems  in  a  complementary  fashion.  We  hope 
that  the  realisation  of  this  framework  will  lead  to  many  interesting  new  research 
directions. 
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DISTRIBUTED  QUERY  FACILITIES  FOR  OFFICE  INFORMATION  SYSTEMS 


Fausto  Rabitti  • 


Abstract 


In  this  paper  a  speciai  class  cf  disLiibuled  databases  is  studied  and  a  form€d 
model  is  presented.  This  model  is  characterized  by  a  few  special  properties  resulting 
from  its  use  in  Office  Information  Systems. 

We  consider  distributed  databases  for  which: 

-  data  distribution  is  by  station  ’ownership’; 

-  there  is  data  partitioning  instead  of  data  replication; 

-  updates  are  restricted  to  local  data; 

-  there  are  electronic  data  mailing  facilities  integrated  with  the  global  database 
management  system. 

In  this  environment,  the  problem  of  global  query  processing  is  studied.  The 
three  sub-problems  of  concurrency  control,  data  movement  control  and  optimal  execu¬ 
tion  strategy  are  discussed,  and  suitable  algorithms  are  presented. 


* 
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1.  INTRODUCTION 


The  immediate  question  we  pose  is:  "Why  do  the  distributed  databases  used 
by  Office  Informations  Systems  differ  from  ’traditional’  distributed  databeuses?" 

Several  systems  for  distributed  databases,  such  as  Distributed  Ingres  [l], 
SDD-1  [2],  SIRIUS  [3],  POREL  [4],  DATANET  [5]  have  already  been  studied.  We  will  refer 
to  the  SDD-1  [6]  as  typical  example  of  the  traditional  approach  to  distributed  database 
systems. 


These  systems  share  a  few  basic  ideas: 

-  Data  is  stored  on  several  sites  and  the  global  system  controls  its  distribution. 

-  Data  is  replicated  at  different  sites  for  reliability  and  faster  access.  This  is  under 
control  of  the  system. 

-  The  system  gives  to  the  user  the  impression  of  dealing  with  a  centralized  data  base, 
with  essentially  the  same  services.  The  system  takes  charge  of  the  synchronization  of 
all  the  concurrent  activities  and  also  takes  charge  of  all  the  data  communication 
involved  (concurrency  control,  multiple  copy  consistency,  global  transaction  process¬ 
ing  optimization,  data  transfer  flow  control). 

In  the  environment  of  Office  Information  Systems,  the  stations  of  the  distri¬ 
buted  system  have  various  requirements.  For  example,  they  heavily  process  data  that 
is  located  inside  the  station  and  they  must  frequently  exchange  information.  Also  they 
must  occasionally  access  information  dispersed  over  several  sites.  Only  this  last 
activity  requires  an  integrated  control  of  the  distributed  database. 

A  common  scenario  for  an  Office  Information  System  (OIS)  could  be  this: 
hundreds  of  stations  (micro/minicomputers),  with  fairly  large  storage  space  (ranging 
from  10  to  100  Mbytes),  connected  by  a  communication  network  of  rather  large  band- 
with  (i.e.  10  Mbps).  The  total  amount  of  data  managed  inside  the  global  system  is  quite 
considerable  (i.e.  1-10  Gbytes)  and  the  possible  number  of  transactions  concurrently 
processed  inside  the  global  system  can  be  very  high. 

Handling  this  system  using  the  traditional  approach  to  distributed  database 
systems  can  be  quite  inefficient.  The  reason  is  that  a  SDD-1  type  system  does  not 
exploit  the  locality  of  most  data  processing  transactions  [9].  Every  data  reference  is 
handled  as  an  access  to  the  global  database  since  there  is  not  the  concept  of  data 
being  local  to  a  station. 

Un  the  other  hand,  today  Office  Information  Systems  give  usually  a  great 
deal  of  autonomy  to  each  station,  it  is  intended  to  be  a  ’personal  station’  with  its  OTra 
database,  internal  procedures,  and  inter-station  communications  facilities.  Each  sta¬ 
tion  is  allowed  to  freely  access  its  own  data,  located  in  the  station  database,  but  usually 
there  is  a  complete  lack  of  organized  capabilities  for  accessing  data  distributed  over 
all  the  stations.  The  local  databases  are  not  seen  as  component  of  a  global  distributed 
database,  as  In  SDD-1  type  systems.  Thus,  it  is  impossible  for  a  station  to  make  global 
query  references  to  the  databases  in  every  station.  This  facility  would  be  very  useful 
indeed  in  an  OIS  environment. 


-203- 


The  purpose  of  this  work  is  to  extend  some  of  the  facilities  of  traditional  dis¬ 
tributed  database  systems  to  the  office  en\’ironment.  In  particular  we  will  study  global 
query  processing  upon  distributed  databases  possessing  the  special  properties  associ¬ 
ated  with  Office  Information  Systems. 


2.  SPECIAL  PROPER  TIES  OF  DISTRIB  UTED  DA  TABASES 
FOR  OFFICE  INFORMATION  SYSTEMS 


We  oi:t1ine  the  main  special  properties  that  characterize  the  type  of  distri¬ 
buted  databases  especially  suitable  for  OIS. 

1)  High  fragmentation  of  processing  and,  storage  capabilities 

Our  typical  environment  is  so  characterized: 

-  a  very  great  number  of  stations,  in  the  order  of  hundreds,  which  are  nodes  in  a  large 
bandwith  communication  network; 

-  limited  processing  power  at  each  station; 

-  disk  space,  for  the  local  database  in  each  station,  of  the  size  of  a  file  cabinet  (5-10 
Mbytes)  for  personal  stations,  but  much  larger  (50-100  Mbytes)  for  archive  stations. 

The  local  DBMS  can  be  quite  simple,  due  to  the  usually  limited  size  of  the 
local  database,  and  it  is  generally  a  single-user  system.  Nevertheless,  the  total  size  of 
the  distributed  database  is  quite  large  (more  than  1  Gbytes). 

2)  Data  distribution  upon  an  ' oiunership’  principle 

The  data  distribution  is  not  controlled  by  the  Distributed  DBMS  on  the 
ground  of  data  content  (as  done  in  systems  such  as  SDD-l).  Each  station  may  only  pro¬ 
cess  data  it  ’owns*,  that  is  data  contained  inside  the  local  database.  There  is  local  con¬ 
trol  of  the  local  database. 

3)  Local  scope  of  update,  operations 

A  station  can  modify  only  that  data  which  it  owns.  One  station  cannot 
modify  anothers  data;  in  this  sense  we  can  say  that  a  station  ’owns’  its  local  data. 

4)  Partitioning  of  data  instead  of  replication 

In  our  environment  we  do  not  allow  data  reolication  at  different  sites  (uxilike 

.A.  * 

systems  such  as  SDD-l),  the  data  constituting  the  global  database  is  partitioned  among 
the  local  databases.  Through  the  use  of  identification  keys,  each  data  item  is  globally 
unique:  no  duplication  is  allowed  in  the  distributed  database. 

Therefore,  the  reliability  problem  is  handled  inside  each  station  (i.e.  with 
the  use  of  transaction  logs  and  back-up  copies).  Since  the  data  in  a  local  database  will 
be  inaccessible  if  the  node  fails,  provisions  should  be  made  for  exLi-a-r  eliability  and  fast 
recovery  of  each  node. 
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5)  Data  movGTnent  by  elQctroTvic  mailing 

Data  items  can  move  from  one  station  to  another,  that  is,  from  one  local 
database  to  another.  The  distribution  of  data  in  the  distributed  database  is  dynamic, 
not  static:  this  is  an  important  difference  from  the  traditional  approach  to  distributed 
database  systems. 

The  station  which  owns  a  data  item  may  fetch  it  from  its  local  database  and 
send  it,  through  the  communication  network,  to  another  station  which  then  becomes 
the  new  owner  of  this  data  item.  The  transferred  data  item  is  first  put  in  a  special  data¬ 
base  on  the  destination  station  (this  database  functions  ns  a  mail  tray  for  incoming 
data  items).  The  control  program  of  the  station  (on-line  user  or  automatic  procedures) 
can  then  recognize  that  a  data  item  has  been  sent  to  the  station  a.nd  can  explicitly 
take  it  from  the  mail  tray  and  put  it  in  the  local  database  (’'receive”  function). 

The  purpose  of  the  mail  tray  databases  and  this  two-phase  transfer  mechan¬ 
ism  is  to  permit  the  station  to  explicitly  control  the  data  inserted  into  its  local  data¬ 
base.  In  this  manlier  the  station  continuously  moiiitors  the  data  which  it  owns:  a  data 
item  is  owned  by  a  station  if  it  is  created  by  the  station  or  is  received  from  another  sta¬ 
tion. 

6)  Compatibility  of  data  organisation  in  each  station 

The  data  model  should  be  the  same  for  all  local  databases,  and  the  same 
local  DBMS  shouiu  be  used  in  each  station.  This  permits  global  queries  to  be  processed 
at  each  station  in  a  uniform  manner,  avoiding  costly  model  and  sy'stem  conversions. 

The  data  schema  too  must  be  identical,  to  allow  any  data  item  to  be  freely 
moved  from  a  station  to  another.  Thus,  if  the  relational  daTa  model  is  chosen,  each 
local  database,  including  each  mail  tray  database,  should  have  the  same  relation 
definitions.  This  assumption  is  not  always  true  in  real  systems,  where  schema  of  local 
databases  may  be  adapted  to  needs  of  the  particular  station.  However,  we  can  imagine 
a  global  schema  with  all  possible  relations  in  the  OIS  distributed  database;  a  station  will 
actually  use,  for  its  local  database,  only  the  needed  subset.  In  this  case  the  global 
schema  must  be  accessible  to  each  station  although  not  necessarily  stored  on  each 
station.  Hence  when  a  station  must  enter  a  tuple  in  g  locally  undenned  relation,  it  nan 
dynamically  add  the  relation  to  its  local  database.  As  a  result,  we  can  assume  that 
each  relation  is  potentially  defined  in  each  local  database. 

7}  Movement  prvriczpLc  exploiting  Locality  of  data  processing 

i’he  behaviour  of  an  Oib‘  distributed  database  system  is  different  from  the 
behaviour  of  r^ystfims  snob  as  SDD-1.  In  the  ]attt;r,  any  node  can  issue  a  transaction 
i.tial.  updates  data  in  the  entire  system.  The  update  transaction  is  moved  to  the  sites 
where  data  is  to  be  modified  or  stored;  the  system  knows  where  such  data  is.  The 
result  is  that  a  transaction  may  interfere  with  others,  hence  the  control  system  must 
provide  for  their  synehi  onization  [I'd].  This  approach  is  necessary  m  the  case  that  the 
pattern  of  references  for  update  transactions  is  randomly  scattered  over  the  whole 
body  of  data;  that  is,  when  no  ’locality’  of  updates  can  be  found  m  time.  If  we  apply 
this  approach  to  our  enmronment,  the  synchronization  overhead  becomes  very  high 
since  very  many  transactions  can  be  submitted  to  the  system  in  the  same  period  due 
to  the  large  number  of  stations.  Very  powerful  nodes  would  be  required  to  bear  this 
load. 


Instead,  in  an  OIS  environment  the  data,  rather  than  the  update  transac¬ 
tion,  is  moved  to  the  site  where  the  update  is  to  be  performed.  This  movement  is  deter¬ 
mined  by  the  explicit  will  of  the  stations,  instead  of  the  global  control  system.  It  is 
determined  by  the  particular  functions  of  the  various  stations,  on  the  ground  of  the 
information  contained  within  the  data  to  be  moved.  Tn  this  manner,  data  is  sequen¬ 
tially  moved  to  the  stations  where  it  is  to  be  processed.  Tins  behaviour  is  consistent 
with  the  usual  data  processing  organization  in  an  office  environment. 

In  order  to  be  effective,  this  approach  requires  ’locality’  of  updates:  the  pro¬ 
cessing  of  data  in  the  system  must  be  concentrated  .sequentially  in  one  station  after 
another.  The  great  advantage  of  this  approach  is  that  no  mechanism  is  necessary  for 
the  concurrenc)'"  control  of  updates  among  different  stations.  Consistency  control  is 
necessary,  in  all  the  distributed  database,  only  for  the  global  query  transactions. 

S)  Requirements  for  global  query  processing 

In  this  system  organization,  it  is  important  that  activities  caused  by  global 
query  processing  interfere  as  little  as  possible  with  activities  of  other  stations  (these 
activities  include  the  local  processing  of  data  and  inter-station  movement  of  data).  The 
network  nodes  are  often  ’personal’  stations  of  on-line  users.  They  should  not  feel  hin¬ 
dered  or  interferred  with  by  any  global  control  system. 

For  these  reasons  we  have  the  following  requirements; 

-  In  each  station,  no  external  global  query  activity  may  block  the  concurrent  local  data 
processing  and  inter-station  data  movement.  Moreover,  the  local  activities  should  have 
priority  of  execution  over  external  global  query  activities.  The  external  activities 
should  try  to  exploit  the  station  processing  power  when  it  is  available. 

-  Ts!o  local  locking,  which  waits  for  an  external  event,  must  occur  in  the  station.  The 
only  locking  allowed  should  he  to  implement  atomic  local  operations  which  provide 
mutual  exclusion  in  accessing  sensitive  data. 


3.  A  FORMAL  MODEL  FOR  OIS  DISTRIBUTED  DA  TA BASES 


We  proceed  by  describing  the  components  and  the  functions  of  our  formal 
model  for  OIS  distributed  databases.  This  model  will  be  used  in  our  study  of  global 
query  processing. 


3.  1 .  Components  of  the  Model 

1)  DDB  {Distributed  Data  Base)  is  the  global  database  distributed  among  all  the  sta¬ 
tions.  W^e  assume  that  the  relational  data  model  is  used  for  the  organization  of  .DDB  as 
well  as  of  all  local  databases  composing  it. 

2)  DDB-Schema  is  the  schema  of  the  global  database.  It  is  common  to  all  local  data¬ 
bases.  JL  is  constituted  a  set  of  Af  ReLatio/ts: 
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/?’•  for  i  =  l,  ...  y  M 

3)  In  the  distributed  system  there  are  N  Stations'. 

Si  for  i  =  l,  .  .  .  .  A' 

They  are  connected  through  a  Reliable  NeLvjork,  a  computer  communication  network 
with  the  following  characteristics: 

A)  Guaranteed  delivery  of  messages. 

B)  Continuous  station  monitoring  -  i.e..  active  stations  are  informed  if  a  sta¬ 
tion  fails. 

C)  Station  clock  synchronization  -  this  is  a  network  mechanism  which  syn¬ 
chronizes  all  station  clocks  to  within  an  acceptably  close  range  called  SCMD 
(Station  Clock  Maximum  Difference)  [11]. 

This  Reliable  Network  does  not  need  a  reliable  writing  mechanism  for  posting  updates 
at  multiple  sites  as  in  the  two-phase  commit  of  SDD-1  [6]. 

4)  In  each  station  S^  .  .  ■  .A")  there  is  a  Local  Database,  LDBi,  where  all  data 

owned  by  the  station  is  organized. 

Each  LDBi  {i  =  1,  ,  .  .  ,  N)  is  composed  of  two  separate  databases: 

A)  MTi  is  the  mail  tray  database  of  the  station; 

B)  DBi  is  the  database  for  the  data  stored  at  the  station  and  available  for 
processing. 

5)  We  call  the  Local  Relation,  A?/,  the  set  of  tuples,  of  the  relation  definition  R^,  belong¬ 
ing  to  the  local  database  LDBi. 

6)  Each  tuple  in  DDB  contains  a  special  key  attribute,  called  TGID  (Tuple  Global 
IDentifier),  which  is  unique  in  the  global  database.  We  can  define  a  KEY  function  for 
each  tuple  t  of  DDB:  KEY{t)  —  TGID  of  i 

If  ti  and  iz  are  tuples  of  DDB,  we  are  sure  that  KEY{ii)  ^  KEYitz)^  From  this  we  have 
the  uniqueness  of  each  tuple  in  the  entire  distributed  database; 

if  fe  DDB  then  there  exists  i  such  that  t  e  LDBi,  furthermore  for  all  r^i: 
t  ^  LDBr-  Also  since:  LDBi  —  -A/T*  (D  DBi  ^"6  hav'e  that  either  I  €  DBi 
t  €  MTi. 

7)  We  have  these  relationships  holding: 

N 

-  DDB  =  f.DHi  and  l-DRi  LDBi  —  (^1  ^  iz) 

4=1  ^ 

So  DDB  is  partitioned,  into  N  local  databases  LDBi. 

-  Assumiuf^  to  be  the  set  of  all  tuples,  in  DDB,  of  relation  definition  RL  we  have: 

7=1 . M 

-  For  each  7  =  1,...  ,N'.  LDBi  -  J  =  1 . A/ 

-  For  each  i  and  j\  R'i  —  R'-^  LDBi 

N  _ 

-  =  ‘\jBi  and  p,  ^ 
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So  each  is  also  parfif.ionpd  into  /V  Incai  rolat.ions  /t2. 

i  b 


3.2.  Functions  of  the  Model 


There  are  three  main  types  of  activities  in  our  model;  local  transacLions, 
data  mailing  and  global  queries. 


3.2.1.  Local  Trcjunsaciioixs 


A  local  transaction  involves  accessing  local  data,  this  operation  does  not 
interact  -with  any  other  station.  A  local  transaction  is  really  any  type  of  access,  allowed 
by  the  relational  DBMS  operating  in  each  station  to  the  station  database  DB^  (note; 
not  to  the  mail  tray  MTi). 

We  define: 

-  as  a  station  query  operation  onDB^', 

-  It’'  as  a  station  update  operation  on  DB^. 


3.2.2.  Data  Mailing 


Mailing  activities  allow  for  the  moving  of  data  from  one  station  to  another. 
We  suppose  that  for  each  mailing  activity,  only  a  simple  data  item  (a  tuple,  in  the  rela¬ 
tional  data  model)  can  be  moved  at  time. 

We  define  two  mailing  activities. 

-  SEND{t:Si^S^) 

This  operation  deletes  a  tuple  i  from  DBi,  sends  it  through  the  communication  net¬ 
work,  and  inserts  it  into  MTj. 

-  REC{t:Sj) 

This  operation  deletes  a  tuple  t  from  MTj  and  inserts  it  into  DBj.  This  operation  is  local 
to  the  station  Sj.  Only  after  t  has  been  ’received’  into  DBj  can  it  be  processed  by  the 
procedures  of  Sj. 

Other,  more  complex  mailing  operations  can  be  devised:  such  as  moving 
tuples  in  batch,  receiving  tuples  globally  (moving  the  entire  contents  of  MTj  into  BBj) 
or  selectively  (just  tuples  coming  from  specified  stations).  Neverthless,  SERB  and  REC 
are  the  only  mail  operations  that  need  to  be  defined  for  our  model. 


3.2.3.  Global  Queries 


I.r-l  jr/7  l>t:  a  (jvii;ry  operation  which  requires  global  access  to  the  data  in  DDB. 
:!5Upposiiig  Uiat  (/7  is  entered  at  the  station  Si,  Si  then  becomes  the  Master  Station  for 
the  global  query  gq.  Si  analyses  gq  and  controls  its  execution.  Also  Si  broadcasts, 
through  the  network,  control  information  and  data  to  other  stations  and  receives  con¬ 
trol  information  and  data  from  them.  With  respect  to  the  global  query  gq,  the  stations 
other  than  S'.j  are  called  Slave  Stations.  The  global  query  processing  may  also  require 
information  exchange,  controlled  by  the  Master  Station,  between  Slave  Stations. 

When  processes  gq^  (n-th  global  query),  it  decomposes  it  into 

corresponding  local  queries  lgn>  one  for  each  local  database  LDB-  at  each  station  .S';  ( 
j  =  l,  .  .  .  .  A"  ).  It  is  important  to  remember  that  in  our  environment  there  is  no  system 
knowdedge  about  the  database  distribution  (i.e.,  on  the  ground  of  some  attribute  con¬ 
tents  of  the  relations,  such  as  fragment  definition  and  distribution  in  SDD-1  [6]).  LDBj 
contains  all  the  data  owned  by  5^:  that  is.  all  the  tuples  created  by  Sj  or  sent  to  it  by 
other  stations.  No  other  station  has  a  complete  knowledge  about  the  contents  of  LDBj, 
So,  the  Master  Station  .St  must  assume  that  each  local  database  has  the  same  DDB- 
Schema.  For  ail  ji  and  jz  (ji  ^  Jg).  Q-s  far  as  knows,  LDBj^  and  LDBj-^  have  the  same 

relation  definitions  .  .  ,R^.  Hence  =  Iq-n^  and  so  we  may  drop  the  super¬ 

script  for  all  Iqn. 

The  local  query  Lq,  relative  to  the  global  query  gq,  contains  all  projections 
and  restrictions  operations,  specified  in  gq,  for  all  the  relations  involved.  W"e  say  that 
Lq  performs  a  reduction  on  each  local  database;  let  lq{LD3j)  be  the  reduction  per¬ 
formed  by  lq  on  LDBj.  These  operations  are  distr  ibutive: 

lq{LDBi\jLDBi)  =  lq{LDBi)\jlq{LDB^) 

so  they  do  nut  r  equire  any  interactions  with  data  in  other  local  databases.  The  syn¬ 
chronization  of  the  processing  of  lq  on  LDBj  with  the  other  local  and  global  activities  in 
Sj  is  performed  by  the  Concurrency  Control  Algorithms  (Sec. 5)  and  the  Data  Movement 
Control  Algorithms  (Sec. 7).  When  lq  is  processed  on  LDBj,  the  reduction  lq{LDBi)  is 
stored  at  Sj. 

The  last  phase  of  the  processing  of  gq  is  then  performed  and  the  final 
answer  gq{DDB)  can  be  computed.  The  Master  Station  Sj,  analysing  the  information 
about  each  lq{LDBj)  sent  by  all  Sj,  chooses  the  optimal  strategy  for  the  final  global 
quer}/"  processing.  It  aims  to  further  reduce  and  then  collect  the  reductions  lq  {f,DR j), 
minimizing  the  amount  of  data  transfer  through  the  network.  This  is  done  if  at  least  a 
join  is  present  in  gq\  oLherwdse  all  reductions  lq{LDBj)  are  simply  collected  into  S^. 
The  problem  of  global  query  strategy  optimization  is  discussed  in  Sec. 3. 
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4.  CONCURRENCY  CONTROL  FOR  GLOBAL  QUERIES 


There  are  two  problems  in  the  processing  of  global  queries  in  our  model: 

-  concurrency  control  of  interfering  global  and  local  activities  in  the  system; 

-  control  of  data  movement  due  to  mailing  activities  m  the  system. 

In  studying  the  concurrency  control  problem,  we  assume  the  distributed 
database  to  be  static,  not  dynamin.  We  also  assume  that  all  data  movement  activities 
{SEND  and  REC  mail  operations)  are  blocked  while  any  global  query  is  in  execution. 

The  concurrency  control  must  synchronize  in  each  the  global  query 
activities  {iq  corresponding  to  grg)  requested  by  other  stations,  and  local  transaction 
activities  (queries  and  updates  'u^).  All  these  transactions  {Iq,  q^,  may  interfere 
in  accessing  the  local  database  LDB^:  precisely,  Iq  needs  to  access  the  local  database 
LDBi  (that  is,  MTi  tJ  DRi),  while  and  ij}  access  the  station  database  DBi. 

Local  transactions,  g’"  and  iR,  can  be  considered  to  be  entered  into  the  sta¬ 
tion  Si  one  at  time,,  sequentially.  They  are  processed  sei  ially  and  they  do  not  interfere 
with  each  other.  Also,  q'^  and  Lq  cannot  interfere  because  they  are  both  simple  query 
transactions.  The  only  possible  interference  will  be  caused  by  lq  and  %R:  local  activities 
of  global  queries  and  local  updates.  The  concurrency  control  problem  can  be  res¬ 
tricted  to  them. 

Another  observation  is  that,  inside  each  Slave  Station  Si,  the  local  transac¬ 
tions,  g'  and  have  priority  over  Lq.  The  order  of  execution  of  local  updates  is 
determined  by  the  order  they  are  entered  into  the  station  and  cannot  depend  upon 
the  occurence  of  some  lq.  So  we  must  control  the  order  of  execution  of  iq,  in  all  Slave 
Stations,  to  guarantee  the  correctness  and  consistency  of  the  answer  to  the  correspon¬ 
dent  global  query  gq.  This  is  the  goal  of  any  algorithm  for  concurrency  control  in  this 
type  of  distributed  database. 


4. 1 ,  S erializabilily  Prvbleni 


In  order  to  discuss  the  concurrency  control  problem,  we  apply  to  our  model 
the  Serializability  Theorem  studied  in  general  for  distributed  databases  [7].  This 
theorem  tell  us  when  the  execution  of  global  transactions  is  serializable  (computation¬ 
ally  equivalent  to  a  serial  execution)  and  hence  correct  [8]. 


D  p.jiriitions 


-  Let  GT  =  GQ  U  : 

-  GT  is  the  set  of  all  the  transactions  on  the  distributed  database  DDB. 

-  =  \  gqi .  gqz . gg/iAx  I 

CQ  is  the  set  of  global  queries  gq  on  DDB. 
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-  U  ■=  \J  Ui  and  Ui  =  ^  j  ,  ^t2  ,  .  .  .  ,  1  i  =  1.  2 . N 

t*! 

U  is  the  set  of  local  updates  X6‘  inside  all  stations  S^. 

-  £■  is  an  execution  of  GT  on  DDB.  that  is  a  particular  history  of  all  the  transactions  in 
GT. 

-  LOCi,  for  i  =  1,  N,  is  the  record  (ordered  set),  in  the  exact  order  of  occurrence, 

of  the  transactions  on  the  local  database  LDB^  of  the  station  5^.  The  transactions 
recorded  in  LOGi  are  ail  u'  and  all  Iq.  An  execution  E  is  completely  described  by  the 
set  of  ail  LOGi,  for  x  =  1,  .  .  .  ,  A^. 

-  GT  and  GQ  are  possible  total  ordering  respectively  of  GT  and  CQ.  Given  two  transac- 
tions  tri  and  tr2  of  any  type,  we  say  that  tri  <  tvz  (  tvi  >  tvz  )  in  a  particular  ordering 
(i.e,  GT,  GQ,  LOGi  etc.),  iff  tr^  comes  before  (after)  tvz  in  this  ordering. 

-  If  the  write-set  of  u'  intersects  the  read-set  of  Lq,  we  say  that  u'  and  Iq  ore  conflicting 
upon  LDBi.  We  define  lq  ^  and  Iqz  as  read-conflicting  upon  LDBi  iff  their  read-sets  are 
not  disjoint. 

-  llVe  define  U\^lqi  -  as  the  set  of  all  local  updates  u^,  in  S^,  which  conflict  with 

both  lq\  and  Iq^  and  for  which  Iqx  <  it*  <  Iq^  in  LOGi. 


THEOREM- 1 

Let  GT  =  CQ  U_f/  and  E  be  an  execution  of  GT.  Then  E  is  serializable  iff  there  exists  a 
total  ordering,  CQ,  with  the  following  property; 

Let  gqi,  gqz  ^  GQ. 

if  for  some  sfatlon  .S'--:  Iq^  1^‘OGi  nnd  lq Iqz  read-oonflict  on  LDBi 

U{lqi  -  lqz\i  <t>,  _ 

then  gq\.<gq2  in  CQ. 


Note  that  if  there  exists  the  correct  total  ordering  GQ,  as  specified  _in 
Theorem-1,  this  implies  that  there  must  exist  also  a  corresponding  total  ordering  GT 
which  is  correct  according  to  the  more  general  Serializability  Theorem  [7]. 

From  Theorem-l  we  have  a  procedure  to  find  a  serializable  execution  E. 


Sy7ichro7viza.tio'n  Tech'niqzte 

if  in  a  station  S^,  lq  ^  is  executed  before  Lq^,  on  LDBi,  and  between  them  at 
least  a  local  update  operation  iiL_conflicting  with  both  lq\  and  Iqz,  is  executed,  we  have 
91  \  total  ordering  GQ  of  GQ. 

Thus,  in  all  other  stations  Sj  (  jVi  ).  we  must  have  either: 

-  Iqi'i's  executed  before  Iq^', 
or 

-  Iqz  is  executed  before  lq  x  and  no  local  update  u^,  conflicting  with  both  iqx 
and  iga,  is  executed  on  LDBj  between  them. 


Ally  tilgorilhiJi  Tor  Iht:  processing  of  tiie  global  queries  in  GQ  on  DDB  which 
follows  the  above  synchronization  technique  produces  a  correct  execution.  This  syn¬ 
chronization  technique  is  much  simpler  than  that  used  in  SDD-1  [10]. 


4.2.  Example  of  Execution 


Here  an  example  is  given  of  a  simple  execution  of  a  global  query,  in  order  to 
explain  the  effect  of  Theorem- 1. 

In  LDBx  the  data  item  X  initially  has  the  value  a\  and  then  some  time  later, 
because  of  a  local  update  operation,  has  the  value  ag-  Simaiarily  in  LDB^  the  data  item 

Y  has  initially  the  value  but  at  some  later  time  has  the  value  62. 

-  in  LDBi,  X:  u  1  ag; 

-  hiLDBj,  Y:  6g. 

We  consider  two  global  queries  gq\  and  gq^-  In  Si,  lq\  is  executed  before 
lq2,  Iq  I  sees  X  as  a  j  and  igg  sees  X  ais  a?. 

In  Sj,  Iq  \  is  executed  before  igg;  Iq^  sees  Y  as  6  1  and  Iq^  sees  Y  as  bg-  This 
meets  the  conditions  jo^ the  Theorem-1  and  we  have  a  correct  execution,  with  gqi  <  gq2 
in  the  total  ordering  CQ.  In  fact  gqi  sees  X  as  and  Y  as  and  gq^  sees  X  as  a 2  and 

Y  as  62-  This  result  is  consistent  with  the  history  of  X  and  Y. 

Suppose  the  conditions  of  Theorem-1  are  not  fulfilled  on  S^.  Let  Lq  ^  be  exe¬ 
cuted  after  Zgg  {iqz  sees  Y  as  bi  and  fg }  sees  Y  as  bg),  then  we  have  an  incorrect  execu¬ 
tion.  In  fact  gq  I  sees  X  as  a.|  and  Y  as  b^,  gQz  sees  X  as  and  Y  as  b^.  Whether  we  con¬ 
sider  pgi  being  before  or  after  ggg  in  GQ,  their  execution  is  inconsistent  with  the  his¬ 
tory  of  X  and  Y. 


5,  CONCURRENCY  CONTROL  ALGORITHMS 


We  can  distinguish  two  difTerenl  cases:  Centralized  Corneurreney  Control  and 
Distributed  Concurrency  Control. 


5. 1 .  Concurrency  Control  on  the  Centralized  Control  Model 


In  this  case  we  have  a  special  network  node,  CN,  with  special  functions:  con¬ 
trol  of  the  data  movement  operations  (mailing  of  tuples)  and  processing  of  global 
queries.  Such  a  network  differs  from  that  presented  in  Sec. 3  where  each  station  is 
symmetric  with  respect  to  functions  and  features.  In  particular,  the  differ  ences  with 
the  distributed  model  previously  presented  are: 
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-  LDBi,  the  local  database  of  station  Si,  is  constituted  by  the  only  DBi  (station  data¬ 
base); 

-  all  the  mail  trays,  MTi,  are  located  at  the  control  node  CN. 

The  data  mailing  is  performed  in  two  stages,  with  the  intervention  of  CN: 

-  during  the  send  operation.  SEND  {t:Si-*Sj),  the  tuple  t  is  taken  from  DBi  at  and  it 
is  sent,  through  the  network,  to  the  central  node  CN,  where  it  is  inserted  into  the  mail 
trB.y  MTj  of  the  station  Sji 

-  during  the  receive  operation,  REC  the  tuple  t  is  taken  from  MTj  and  sent, 

through  the  network,  from  CN  to  Sj  where  it  is  inserted  into  DBj. 

The  processing  of  global  queries  is  also  controlled  by  CN:  when  a  station 
submits  a  global  query  r;q,  it  is  first  sent  to  CN.  CN  then  orders  the  global  query  by 
using  the  time  of  the  query's  arrival  at  CN.  So  for  each  gq  there  is  a  unique  Times¬ 
tamp.  TSigq).  which  is  the  value  of  the  CN  clock  [12].  CN  broadcasts  Iq  to  Sj 
(j  =  li  ■  ■  •  >  N),  following  the  order  of  TS{gq).  The  Reliable  Network  ensures  that  if  Lq  i  is 
sent  to  Si  before  Iq^  then  lq  i  will  be  received  by  Sj  before  fga*  So  each  station  Sj  can 
perform  every  lq  on  DBj  in  the  same  gq  Timestamp  order.  Therefore,  the  conditions  of 
Theorem-1  arc  satisfied  and  the  execution  of  each  gq  is  correct.  Moreover,  this  cen¬ 
tralized  control  strategy  imposes  conditions  stronger  than  those  of  Theorem-1:  the 
execution  of  aii  gq  is  not  only  serializable,  but  also  serial. 

The  control  node  must  also  perform  each  lq  on  the  mail  trays  MT^  after  lq 
has  been  performed  at  all  stations.  The  lq  are  processed  by  CN  in  the  same  order  as 
has  been  performed  at  the  stations.  In  this  model  CN  also  controls  the  optimal  pro¬ 
cessing  of  the  local  reductions  lq{DBj).  The  final  answer  gg{DDB)  is  finally  transmitted 
to  the  station  Sj  which  made  the  original  request.  The  communication  pattern  in  this 
model  implies  a  star  type  topology  for  the  Reliable  Network,  where  CN  is  the  central 
node  arid  all  Sj  are  the  satellites. 

This  model  is  useful  when  we  locate  most  of  the  coordination  and  communi¬ 
cation  logic  at  a  more  powerful  central  control  node  CN.  This  allows  the  other  nodes  to 
be  relativeh.’-  simple.  The  choice  of  locating  all  the  mail  trays  MTj  inside  CN  allows  the 
local  nodes  to  be  temporarily  disconnected  without  totally  blocking  mail  between 
them.  The  obvious  trade-o/T  is  that  the  performance  of  the  global  system  is  limited  by 
the  bollleneck  at  the  contr  ol  node  power.  If  the  number  of  local  stations  is  very  high 
(i.e.  hundreds  concurrently  active)  it  can  be  quite  difficult  for  a  control  node  to  keep 
pace  with  them. 


5.S.  Distributed  Concurrency  Control 


The  algorithms,  for  the  concurrency  corrLx-ol  of  global  query  processing,  in 
the  distributed  model,  are  based  on  the  concept  of  Timestamp  {TS)  Ordering  [12], 
When  the  global  query  gg  is  submitted  to  the  Master  Station  Si,  a  Timestamp,  TS{gq), 
is  given  to  gq  using  the  value  of  the  station  clock.  The  Timestamp  is  unique  within 
the  entire  system;  so  TS{gqn)  ~  "^'S  {gq,n)  implies  n  =?n.  When  the  Master  Station  S'! 
sends  lq  to  the  Slave  Stations  Sj  (  j  ),  along  wdth  lq  is  sent  its  Timestamp  TS  {lq) 
which  is  equal  to  TS{gq). 
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There  is  a  natural  total  ordering  of  all  Timestamps:  it  is  always  possible  to 
say  that  either  TS{gq^)  <  TS{gq2)  or  TS{gq.i)  >  TS{gq2)-  The  concept  of  Timestamp 
Ordering  is  that  the  total  order  of  TS(gq)  must  be  imposed  on  the  execution  of 
conflicting  global  queries  at  every  local  database.  If  Iq  i  and  lq2  are  read-conflicting  on 
a  local  database  LDBp  and  need  to  be  synchronized  (see  the  conditions  of  Theorem-l), 
and  if  TS{lqi)  <  TS{lq2),  then  Iq  i  must  be  performed  before  Iq^  onLDBj  {iq  \  <  Zqg  in 
LOGj). 


Sevex-al  fuadaiiienlai  algorithms  for  the  concurrency  control  in  distributed 
databases  have  been  studied  [l2].  In  our  en\’ironment  all  the  algorithms  based  on  the 
two-phase  locking  concept  are  inadequate:  data  locking  situations  which  wait  for  exter¬ 
nal  conditions  arc  present  in  them.  In  this  case,  local  update  operations  would  be 
blocked  by  external  operations,  against  our  assumption  of  Sec. 2.  Among  the  TS  Order¬ 
ing  algorithms,  three  can  be  adapted  to  our  model:  the  Basic  Algorithm,  the  Multi- 
Version  Algorithm  and  the  Conser\'ative  Algorithm  [12],  The  Basic  TS  Ordering  Algo¬ 
rithm,  although  simple,  is  in  practice  useless:  in  fact  Iq  can  be  rejected  by  Sj,  if  il 
comes  in  the  wrong  TS  order.  In  this  case  the  whole  gq  is  aborted  and  must  restart.  In 
our  environment,  due  to  the  large  number  of  stations,  this  event  could  be  very  likely. 
Wc  now  present  the  other  two  algorithms  pai’ticuiarly  adapted  for  our  formal  model. 


5.2.1.  Version  TS  Ordering  Algorithm  in  OfS  Environm.ent 


The  purpose  of  this  algorithm  is  to  control  the  TS  ordering  of  conflicting  Iq 
without  rejecting  those  which  have  arrived  in  the  wrong  order.  We  suppose  that  the 
granularity  of  the  control  of  interference  is  simply  at  level  of  local  relation.  The  algo¬ 
rithm  described  here  imposes  stricter  conditions  than  those  required  for  the  Multi- 
Version  TS  Ordering  in  [12].  These  conditions  are  in  effect  useless  for  the  purpose  of 
just  concurrency  control,  but  they  are  necessary  In  order  to  extend  this  algorithm  for 
data  movement  control  (Sec, 7.1). 

At  each  station  Sj,  for  each  local  relation  R],  we  keep  a  record  of  the 
several  versions  of  resulting  from  the  sequential  local  updates  upon  R].  Let 
VERS[Rj]{u'^)  be  the  version  of after  update  has  been  processed.  [.R^](u^) 

will  not  be,  of  course,  an  entire  copy  of  Rj,  but  a  record  of  the  differences  between  the 
last  version  of  R}  and  a  previously  saved  version,  Rj  —SAVE  (i.e.  for  back-up  purpose). 
We  have  [i^J](u2)  =  Rj,  if  is  the  last  update  operation  on  this  local  relation. 

Associated  to  each  [/?]•] (u.'*),  there  is  a  time  mark  TV.  This  TV  is  TS  showing 

the  time  when  was  performed  on  Rj  according  to  the  station  Sj  clock. 

The  idea  of  this  algorithm  is  to  perform  Lq  on  the  version  of  Rj  that  was  the 
most  recent  version  when  lq  was  issued.  In  effect  the  result  is  relative  to  the  clocks  in 
the  stations  Si  and  Sj,  which  respectively  determine  TS{lq)  and  the  TV  of 
[/?j](t/.^)  (it  is  not  true  for  absolute  times). 

According  to  this  algorithm,  when  lq  arrives  to  Sj  and  it  involves  the  rela¬ 
tion  R'^,  we  find  the  maximum  TV  such:  TS(Lq)  >  TV.  Then  iq  is  performed  on 
[/?|](w^)  associated  with  this  TV. 
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The  main  problem  of  this  algorithm  is  that  at  each  5“^  the  number  of 
recorded  versions  of  the  local  relations  can  grow  beyond  any  limit  and  saturate  the 
local  storage.  To  avoid  this  problem,  a  technique  may  be  exploited  to  ’forget’  the  old¬ 
est  versions  \_Vd\,  which  are  assumed  to  be  no  longer  of  use. 

With  this  forgettzng  technique,  all  VERS\^R^^{'u'')  are  deleted,  if  the  associ¬ 
ated  TV  contains  a  TSixi^)  such  as; 

TS(u')  <  [Station  Clock]  -  MED 

where  MED  is  the  Maximum  Estimated  Delay  for  the  execution  of  any  global  query  in 
the  distributed  system  {MED  should  include  SCMD  (Sec. 3.1)). 

We  can  expect  that  a  few  versions  are  to  be  saved  for  each  local  relation 
since  only  a  few  can  really  occur  during  the  last  MED  time  period  (recall  is  a 

perso'n.al  database).  Therefore,  in  our  environment  the  extra  space  required  by  this 
aigoritlim  is  quite  acceptable. 


5,^.2  Conservative  TS  Ordering  Algorithm  in  an  OTS  Environment 


The  purpose  of  this  algorithm  is  to  totally  control  the  order  of  execution  of 
all  the  global  queries,  ensuring  that  this  order  be  the  same  in  all  the  stations.  At  each 
station  Si,  every  Iq  is  executed  on  LDBi  in  strict  TS  (Iq)  order. 

This  algorithm  ensures  a  type  of  global  query  synchronization  very  similar 
to  that  of  the  Centralized  Concurrency  Control  (Sec, 5.1).  The  result  is  that  the  restric¬ 
tions  imposed  are  much  stronger  than  those  imposed  by  Theorem-1:  no  matter  if  any 
/,<7i  and  lq2  conflicting,  or  that  they  are  separated  by  conflicting  local  updates,  in 
all  cases  they  are  processed  according  to  their  TS  order.  Thus,  the  execution  of  all  gq 
is  not  only  serializable  (as  in  the  previous  algorithms)  but  even  serial.  Neverthless,  the 
strict  total  ordering  imposed  by  this  algorithm  at  all  stations  turns  out  to  be  very  use¬ 
ful  for  the  data  movement  control  (Sec. 7. 2). 

At  each  station  S^  we  define  N-1  queues  QUEUE j.  with  j^i,  one  for  each  of 
the  other  stations  in  the  distributed  system.  A  request  Iq,  coming  from  its  Master  Sta¬ 
tion  Sp  is  put  in  the  QUEUEj.  Note  that  the  Reliable  Network  ensures  that  if  sends 
?.7j  before  Iqn  to  S^,  Iq  ^  will  be  before  Iqr,  in  QUEUEj  of  Si, 

According  to  this  algorithm,  a  station  Si,  in  order  to  process  a  request  Iq  for 
a  global  query,  proceeds  as  follows: 

-  it  waits  until  there  is  at  least  one  Iq  in  each  QUEUEj-, 

-  it  chooses  the  Iq  such  TS(lq)  is  the  minimum  from  all  QUEUEj  (considering 

only  the  first  element  in  each  QUEUE),  for  ji=l,  ,  .  .  ,N  and  ji 

-  then  it  can  process  Iq  on  LDBi. 

There  is  a  problem  with  this  algozhthm;  if  a  Sj  does  not  send  any  lq\  for  a 
long  period  of  time,  all  other  stations  Si  cannot  naeanwhile  process  another  igg.  To 
avoid  such  unlimited  delays,  various  techniques  hav^e  been  studied  [12]: 

-  the  use  of  dummy  Iq^.  broadcast  at  fixed  periods  by  idle  stations; 

-  the  use  of  special  Timestamps,  bearing  arbitrarily  large  times,  for  the 

dummy  Iq^,  in  order  to  avoid  excessive  netwoi'k  ti'ailic.  They  must  be 
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preempt.ed  when  a  real  Iq  from  the  same  station  is  to  be  perfor  iiied. 

In  our  environment,  with  pimh  a  large  number  of  stations,  another  tech¬ 
nique.  avoiding  the  transmission  of  dummy  Iq^  through  the  network,  can  be  used.  Here 
we  must  assume  a  reasonable  maximum  time  of  propagation,  TP,  of  a  message  through 
the  Reliable  Network.  When  the  station  Sj  has  to  execute  Iq^,  and  QUEUE^  is  empty.  S.j 
can  assume  that  no  Iqr,  with  TS  (Ig-r)  <  TS  was  sent  by  S',  if; 

TS(lq^)  <  [Station  Sj  Clock]  -  TP  -  SCMEj 

where  SCMD  is  the  Station  Clock  Maximum  Diflerence,  ensured  by  the  Reliable  Network 
(Sec. 3. 1). 


6.  DA  TA  AfOVEMENT  CONTROL 


The  previous  discussion  of  the  concurrency  control  problem  refers  to  a 
static  distributed  database.  No  data  movement  between  difTerent  local  databases  at 
different  stations  was  allowed.  The  problem  now  is  to  consider  our  model  in  its  full  gen¬ 
erality:  we  have  to  extend  the  previous  results  to  the  case  of  the  dynamic  distributed 
database  where  data  mailing  operations  are  allowed,  i.e.,  data  items  can  be  transferred 
from  one  local  database  to  another. 

We  have  defined  the  two  mailing  activities;  SEND  (t  :Si-*S  j)  and  REC  (t  :Sf). 
We  can  consider  the  operation  of  moving  the  tuple  t  from  a  database  A  to  a  database  B 
as  the  siTTiultaneous  execution  of  two  update  operations:  the  first  update  erases  t  in  the 
database  A,  the  second  update  creates  t  in  the  database  B.  These  two  updates  should 
be  contemporaneous  (i.e.  an  instantaneous  transfer  of  t)  in  order  to  avoid  these  two 
conditions: 

-  at  a  certain  moment,  t  is  in  both  databases  A  and  B: 

-  at  a  certain  moment,  t  is  in  neither  database  A  or  H. 

This  requirement,  for  SEND  {t  implies  the  use  of  combined  tech¬ 

nique  of  acknowdedgments,  between  iSi  and  Sj,  and  locking  on  DBi  and  AfTj.  This 
requirement  is  more  easily  implemented  for  REC  Since  this  operation  occurs 

completely  mthin  S^.  we  have  only  to  temporarily  lock  the  relation  to  which  t  belongs. 
inMTj&ndDBj. 

Thus,  we  have  for  SEND 

-  u\  on  DBi  (inside  LDB^  of  S',):  delete  f; 

-  on  MT.^  (inside  LDB  ■  of  Sj)\  insert  t. 

We  have  for  REC  (t:Sj),  inside  LDDj  of  Sj: 

-  uj  on  MT^-:  delete  f; 

-  on  DBj:  insert  t. 

Recall  that  in  our  OIS  environment  to  know  the  owner  of  a  data  item  is  valu¬ 
able  information  about  that  data  item.  So  a  global  query  involving  a  tuple  f  may  be 
interested  to  knew  both  the  contents  of  t  and  its  location  (and  even  if  t,  located  in 
LDBi,  is  in  the  mail  tray  MT^  or  in  the  station  database  DB^}.  The  update  operations  u\ 
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and  7i4.  u\,  change  the  location  of  t,  and  so  may  be  said  to  alter  some  informa¬ 

tion  about  t:  i.e,  in  which  database  t  is  located.  Since  the  update  operations  derived 
from  the  data  movement  operations  {SEND  and  REC)  change  information  about  the 
data  items,  they  must  be  considered,  together  with  the  common  local  updates,  to 
effect  the  correctness  of  global  query  processing.  All  these  special  update  operations, 
u\  on  LDBi,  and  u^.  uj,  ui  on  LDBj,  must  be  properly  synchronized  with  the  local 
operations  Iq  of  the  global  queries  gq.  We  may  call  this  data  moveTnent  consistency 
control. 


We  can  now  restate  Theorem- 1  to  include,  in  addition  to  the  concurrency 
control  conditions,  the  data  movement  consistency  control  conditions  for  the  global 
query  processing  over  the  dynamic  distributed  database. 


THEGREM-2 

Unlike  Theorem-1,  here  we  define  LJ  as  the  set  of  all  extended  updates  u'’,  upon  the  local 
database  LDBi  (for  each  station  Si,  i  =  l,  .  .  .  ,  /v).  where  can  be  either; 

-  a  local  update  submitted  to  the  station  Si,  to  be  performed  on  the  station 
database  DBi: 

-  the  operation  of  rieleting  t  from  DBi  for  SEND  {I'.Si-^Sj)', 

-  the  operation  of  inserting  t  into  for  SEND  (t 

-  the  operation  of  deleting  t  from  MTi  for  REC  {t  :Si); 

-  the  operation  of  inserting  t  into  DBi  REC  {t  iS'i). 

The  terms  of  this  theorem  are  exactly  the  same  as  those  of  Theorem- 1  except  all 
now  refer  to  extended  updates  instead  of  simple  local  updates. 

Thus,  the  same  synchronization  technique  (Sec.4-.  1)  can  be  used  for  both 
the  concurrency  control  and  the  data  movement  consistency  control,  where  simple 
local  updates  are  substitued  by  extended  updates.  We  call  this,  the  Extended  Syn¬ 
chronization  Technique. 

Therefore,  the  algoi  ithnis  have  presented  for  the  concurrency  control  in 
global  query  processing  (Sec. 5).  can  be  easily  extended  to  include  the  data  movement 
consistency  control  provided  extended  updates  are  considered. 


6*.  1 .  Exa.m.ple  of  Execution 


Let  us  consider  an  example  of  a  simple  execution  of  a  global  query  which  is 
interested  in  the  location  of  the  tuples  U  ^2.nd  tj.  The  effect  of  Thcorem-2  for  data 
movement  consistency  control  is  illustrated. 


-  ti  is  in  LDBi  .  then  it  is  sent  to  LDB^  ; 

1  2 

-  is  in  LDB.:.  then  it  is  sent  to  LDB^  ,  and  then  to  . 

J  J  1  JZ  J  3 


Assume: 
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We  consider  the  two  global  queries  gqi  and  gqz-  Suppose  that: 

1)  9Q\  sees  in  in  LDB^^  and  tj  in 

2.a)  gqz  sees  U  in  LDBi^  and  tj  in  LDBj^, 

This  meets  the  conditions  of  Theorem-2,  and  we  have  a  correct  execution  (with  regard 
to  data  movement  consistency  control),  with  gq  <  gq^  in  the  total  ordering  CQ.  In 
fact  was  really  transferred  from  to  Si^  and  tj  from  Sj^  to  Sj^. 

Now  suppose  that: 

8.b)  agro  sees  ti  in  LDBi^  and  tj  in  LDBj. 

This  breaks  the  conditions  of  Theorem-2,  and  we  have  an  incorrect  execution.  In  fact  if 
we  consider  gq-^  before  gq^,  in  the  global  execution  corresponding  to  CQ,  we  would 
incorrectly  infer  that  tj  was  transferred  from  Sj^  to  Sj^.  If.  in  the  other  case,  we  con¬ 
sider  gq2  before  gq^  in  the  total  ordering  CQ,  then  we  infer  that  U  was  transferred  from 

•S',-  to  Sa  ;  this  is  incorrect  also. 

3 1' 


6.2.  Complete  Data  Movemertt  Control 


The  data  movement  consistency  control  conditions,  expressed  in  Theorem- 
2,  do  not  ensure  the  complete  correctness  of  the  global  query  processing  for  a  dynamic 
distributed  database.  Thcorcm-2  ensures  that  the  information  retrieved  from  each 
tuple  accessed  by  a  global  query  be  consistent  whether  the  information  concerns  the 
data  content  of  tuples  or  their  location  in  a  particular  database.  Nevertheless,  since 
data  is  moving  among  the  stations  during  the  critical  time  when  a  global  query  is  in 
execution,  anomalous  conditions  may  arise  and  compromise  the  correctness  of  the 
answer. 


We  define  the  critical  period  in  the  processing  of  a  global  query  gq  as  the 
interval  from  the  moment  when  Iq  is  processed  in  the  first  station  until  the  moment 
when  Iq  is  processed  in  the  last  station.  During  the  critical  period  of  a  global  query,  it 
is  the  movement  of  data  items  from  one  station  to  another  that  causes  the  problems. 
For  this  reason  the  mail  operation  SEND  must  be  controlled,  while  the  mail  operation 
EEC  does  not  need  such  control  since  it  moves  data  items  between  databases  {  MT  and 
DB  )  inside  the  same  station.  Next  we  give  examples  of  these  anomalies: 


Anomalous  Condition- 1 

In  the  critica.1  period  of  gq,  we  have: 

-  a  tuple  i  is  first  in  LDB^  (at  station  Si)-, 

-  Lq  is  executed  on  LDBj  (  j^i  ): 

-  f  is  sent  from  to  Sji  SEND  {t  :Si-^Sj)', 

-  iq  is  executed  on  LDBj. 


With  this  sequence  of  operations,  gq  misses  the  tuple  t  in  DDB. 
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Artomaloxis  CoTiditio7i-2 

In  the  oriticai  period  of  gq,  we  have: 

-  a  tuple  t  IS  first  in  LDBi  (in  Si)', 

-  Ig  is  executed  on  LDDi, 

-  i  is  sent  from  to  Sj'.  SEND  (t  ’.S^-^Sj); 

-  Iq  is  executed  on  f.DEj  (in  Sj,  j^'i) 

With  this  sequence  of  operations,  gq  sees  the  same  tuple  t  in  two  local  databases,  T.DB^^ 
ondLDBj',  that  is,  it  counts  t  twice. 


Complp.te  Data  Movement  Control  Condition 

If  a  tuple  t  is  in  the  distributed  database  DDB  for  the  entire  critical  period 
of  a  global  query  gq,  then  Iq  must  access  f  in  a  local  database  LDB^,  and  must  not 
access  t  in  any  other  local  database  LDB.-  (  j ). 

This  condition,  together  -vvith  the  conditions  of  Theorem-2,  ensure  the 
correct  execution  of  a  global  query  over  the  dynarmc  distributed  database.  These  con¬ 
ditions  completely  solve  the  problem  of  concurrency  and  data  movement  control. 


7.  DA  TA  MOVEMENT  CONTROL  ALGORITHMS 


The  algorithms  presented  for  concurrency  control  (Sec. 5)  are  here 
extended  to  include  complete  data  movement  control  (Sec. 6. 2). 


7. 1 .  Complete  Algorithm  for  Multi-Version  TS  Ordering  in  an  OIS  Environment 


Using  this  algorithm,  in  each  local  database  LDBi,  for  each  local  relation 
a  new  version  F£’/2iS' [/?/](u')  is  saved  whenever  an  extended  update  is  performed  on 
(u^  being  a  result  of  a  local  update  or  data  movement). 

Following  the  algorithm  of  Sec. 5. 2.1,  we  process  Iq  on  that  [i?/](ti^) 

such  that  the  associated  TV  has  the  maximum  value  lower  than  TS{lq).  Recall  that  TV 
is  the  Timestamp  of  that  is,  TS  (u'*).  A  problem  with  this  algorithm  is  that  although 
it  does  satisfy  the  conditions  of  Theorem-2,  it  does  not  satisfy  the  complete  data  move¬ 
ment  control  condition. 

For  example,  suppose  we  have  SEND  {t:Sp-*Si),  in  which: 

-  is  the  deletion  of  t  from  DB^’. 

-  Uq  is  the  insertion  of  t  into  MTi. 

TS  (u^  )  is  set  according  to  the  station  Sp  clock,  while  TS(u2)  is  set  according  to  the 
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station  Si  clock. 

Then  we  can  have  two  cases: 

1)  TSM)<TS(u\) 

In  this  case  a  gq,  such  that: 

TS(u^)riTS{gq)<TS{-u.i,) 

will  miss  the  tuple  t.  Iq  will  be  processed  on  ),  from  which  t  has 

been  deleted,  and  on  a  version  of  Rl,  before  VERS{R^]{u\),  where  i  has  not 
yet  been  inserted  (Anomaly- 1). 

2)  TS{-u.^)>  TS{%l\) 

In  this  case  a  gq,  such  that: 

TS(u-^)  >TS(gq)^TS{u\) 

will  consider  t  twice.  Iq  will  be  processed  on  ),  where  i  has  just 

been  inserted,  and  on  a  version  of  Rl,  before  j(u? ),  where  t  has  not 

yet  been  deleted  (Airomaiy-2). 

Therefore,  in  order  to  respect  the  complete  data  movement  control  condi¬ 
tion,  we  impose  that,  for  each  operation  SEND  (t:S^^Si),  it  must  have: 

TS{u^)  =  TS(ui) 

The  two  versions  in  Sp  and  VERS{R^]{u2)  in  5',  must  be  associated  with 

the  same  Timestamp  value,  the  same  TFmark. 


To  achieve  this,  we  proceed  as  follows:  when  t  is  sent  from  S^,  TS{zl\)  is 
temporau'ily  set  according  to  the  station  Sp  clock.  When  t  arrives  in  5'^  and  it  is  inserted 
into  MTi,  if  the  station  clock  value  is  lower  than  the  dock  is  ■^et  fnrw.arri  to 

this  Timestamp  value.  So  TS{u2)  iri  given  according  to  the  new  station  clock  value, 
which  is  equal  to  TS  {u^ ). 


If  instead  the  station  clock  value  is  higher  than  TS(u^),  the  station  clock 
is  not  modified  and  7'S‘(u,|)  is  given  according  to  this  clock.  In  this  case,  it  is  now 
necessary  to  send  back  to  bp  the  value  of  TS{xl2).  TS{u^)  is  changed  to  TS{u2)  and 
the  station  Sp  clock  is  changed  in  accordance  (if  it  is  now  necessary).  During  this  time 
R^  (where  t^R^)  should  be  locked. 

This  algorithm  respects  the  complete  data  movement  control  condition 
because  the  same  Timestamps  are  given  to  the  two  extended  updates  for  any  SEND 
operation.  This  mechanism,  that  can  require  to  move  ahead  station  clocks,  is  perfectly 
in  accordance  to  the  technique  of  Lamport  [ll]  used  by  the  Reliable  Network  for  the 
station  clocks  synchronization  (Sec. 3.1),  guaranteeing  a  constant  maximum  difference 
of  SCAfD. 


7.2.  Complete  Algorithm  for  Conservative  TS  Ordering  in  an  OIS  Environment 


This  algorithm  forces  the  global  queries  to  be  totally  ordered,  in  their  exe¬ 
cution,  following  their  Timestamp  order.  Therefore,  in  each  station  S.^,  Iq^  is  pro¬ 
cessed  before  lq2  iff  T<S  {iq  i)  <  TS(lqz).  This  algorithm  does  satisfy  the  conditions  of 
Theorem-2,  but  does  not  satisfy  the  complete  data  movement  control  condition. 

For  example,  Anomaly-1  (missed  tuple)  occurs  with  this  succession  of 

events: 

1)  Lq  is  performed  at  S',-,  while  t  is  in  LDBi, 

2)  SEND{t:Si->Sjy. 

3)  lq  is  performed  in  S^,  while  t  is  in  LDBj. 

And  AnoriiiiIy-2  (double  counting)  occur  s  if: 

1)  Lq  sees  t  in  LDBi\ 

2)  SENDit-.Si-^SjY, 

3)  ig  sees  tinLDBy 

In  order  to  avoid  these  two  anomalies,  we  must  add  further  control  condi¬ 
tions  whenever  a  tuple  is  moved  from  one  station  to  another.  Let  us  suppose  we  have 
to  perform  SENDijt'.Sx  ^S^).  If  is  the  last  global  query  performed  in  before  the 
sending  of  t,  we  can  assume  that  ail  Lq  with  ’I'Biiq  )  <  TS(Lqy)  have  already  seen  t.  If  Iq^ 
is  the  last  global  query  that  was  performed  in  before  the  arrival  of  we  can  sup¬ 
pose  that  all  lq  such  that  TS  {lq)  >  TS{lqj)  will  see  t. 

We  now  have  three  cases: 

1)  TS(lqi_)  <  TS{Lqj) 

In  this  case,  all  lq  such  that: 

TS{lqi)  <  TS{lq)  £  TS{lqf) 

would  miss  t,  because  they  are  performed  in  before  the  arrival  of  t,  and  at 
Si  after  the  departure  of  t.  Thus,  we  must  save  an  image  of  t  at  Si- 

2)  TS(lqi)  >  TS(lqi) 

In  this  case,  all  lq  such  that: 

rSilqj)  <  TS{lq)  S  TS{lqi) 

would  see  /.  in  S.,  after  llicy  have  already  seen  it  in  Thus,  we  must  impose 
that  they  will  not  see  i  in  LDDj  when  they  are  performed  at  Sj  (  t  will  be 
masked  in  LDBj  from  them), 

3)  TSr,qo=TS(lqj) 

In  l.h  is  case  {Iqi  —  Iqj),  no  control  condition  is  necessary,  about  t,  for  the  exe¬ 
cution  of  any  Lq,  whether  m  or  Sj. 
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Movement  Control  Structures 

We  now  describe  the  control  structures  and  the  procedures  for  this  syn¬ 
chronization  technique. 

At  each  Si". 

-  Control  Relation  NOi,  containing:  {KEY{t) ,  TS{Lqi)] 

This  means  that  every  Iq,  such  that  TS{lq)  ^  TS{lqi),  when  it  is  processed  in  S^,  will 
not  see  t  inside  LDBi  (to  avoid  seeing  t  twice  or  more). 

-  Local  Database  IDBi  (with  DDB-Schema),  containing  the  images  of  tuples  which  are 
really  in  the  LDB^  (j^i)  another  station. 

The  purpose  of  IDB^  is  to  save,  inside  S^,  the  images  of  soine  tuples,  in  order  to  avoid 
missing  them  by  some  global  query. 

-  Control  Relation  YES^,  containing:  [KEY{t)  ,  TS  {Iq  {)  ,  TS{lqz)^ 

This  means  that  every  Iq,  such  that  TS{lq-^)  <  TS{lq)  ^  TS{lqQ),  when  it  is  processed 
in  Sj,  will  see  t  whose  image  is  saved  in  IDBi^,  I’herefore  Iq  avoids  missing  t,  evexi 
though  it  does  not  see  i  in  the  station  where  t  is  now  located. 


Movement  Control  Procedures 

A)  Procedure  for  the  control  of  moNhng  a  tuple  t  from  a  station  5'i  to  another  station 

s.. 

Definitions: 

-  Iqi  is  the  last  global  query  performed  in 

-  Iqj  is  the  last  global  query  performed  in  Sj. 

Procedure: 

~  ki  Si'. 

-  if  iKEY{t ),  TS  (Iqn,)]  is  in  NO^,  then: 

-  delete  it  from  NOi 
TS  TSilqm) 

-  else:  TS  TS (Iqi) 

-  SEND  Sj);  also  TS  is  sent  from  to  Sj 

-  TS(lqj)  is  sent  from  S-  to  Si 

-AtS,:  _ 

-  if  TS{lqj)  >  TS,  then; 

-  f  is  inserted  into  IDBi 

-  [KEY(t),TS,TS  (iqj)]  is  inserted  into  JESi 

-  At  Sji  _ 

-  if  TS  (Iqj)  <  TS,  then: 

-  ),  T5']  is  inserted  into 

B)  Procedure  for  the  control  of  the  execution  of  a  global  query  Lqn,  with  Timestamp 
TS{lq^),  at  the  station 
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Proced7j,re: 

-  Iq^  is  exeovjied  on  the  following  set  of  data  at  S^: 

-  the  local  database  LDB^  (that  is,  DBi\jMTi) 

-  minus  all  tuples  t,  belonging  to  LDB^,  such  that  \KEY{t),x^  is  in  NOj  {x 
represents  the  "don’t  care"  value) 

-  plus  all  tuples  belonging  to  IDB^^  such  that; 

TSijLqCi  <  TSiLqn)  ^  TSilqz) 
where  lKE'r{t ),  7"5’  {Iq  i).  TS  {Iqz)]  is  in  YESt 

-  Delete  from  NOi  all  [^.TS 

-  Delete  from  YESi  all  [x.r,  75"  (^7,1)]. 


7.3.  Algorithm  for  Data  Movement  Control  in  the  Centralized  Control  Model 


Now  we  examine  the  Centralized  Control  Model,  presented  in  Sec. 5.1,  in 
which  a  control  node  CN  supervises  all  the  stations  S’i  .  This  algorithm  is  similar  to  the 
previous  one  (Complete  Algorithm  for  Conservative  TS  Ordering),  except  that  the  data 
movement  control  structures  and  procedures  are  primarily  located  in  the  central 
node.  Therefore  the  job  of  each  station  is  simplified  but  the  amount  of  activity  the  CN 
must  handle  may  be  extremely  large  (depending  upon  the  total  number  of  stations). 


Movement  Control  Structures 


At  each  station  S^: 

~  Control  relation  NOi,  containing;  [KEY (t )  ,  TS{lq  1)] 

This  means  that  every  Lq,  such  that  TS(lq)  ^  TS{lq]),  when  it  is  processed  in  S^,  will 
not  see  t  inside  DBi. 

In  the  central  node  CN: 

-  Control  relation  NO,  contpAning:  [KEY(t)  ,  7’5''(£gi)] 

This  means  that  every  Iq,  such  that  TS{Lq)  ^  TS{lqi),  when  it  is  processed  in  CN  on 
all  the  mails  trays  MTi  (  ....  N).  will  not  see  t. 

-  Local  database  IDB  (with  DDE-Schema),  containing  the  images  of  tuples  which  are 
actually  in  the  local  database  LDBi  of  same  station  vS"!. 

-  Control  relation  YES,  contzAmng:  [KEY{i) ,  i,  TS{lqi),  TSilq^)] 

This  means  that  every  Iq,  such  that  TS{lqi)  <  TS{lq)  ^  7S'(ig2),  when  it  is  processed 
in  CN,  will  also  see  t  whose  image  is  saved  in  IDB  {t  is  assumed  to  be  owned  by  Si). 
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MovemeTit  Control  Procedures 

In  this  model  we  must  control  both  SEND  esid  REC  operations  because  they- 
both  imply  the  movement  of  a  tuple  either  to  or  from  CN, 

A)  Procedure  for  the  control  of  sending  a  tuple  t  from  to  S^:  t  is  moved  from  DB^  (in 
Si)  to  CN  and  inserted  into  the  mail  tray  MTj. 

Definitions: 

-  Iqi  is  the  last  global  query  performed  at  Si 

-  Iq^  IS  the  last  global  query  performed  at  CN)  so  it  has  been  already  performed  at  all 
stations. 

Procedure: 

-  kt  Si'. 

-if  [KEY{t),TS  {Iqn)]  is  in  NO  i.  then: 

-  d^ete  it  from  NOi 
_^TS  ^  TS{lq„) 

-else:  TS  *-  TS{lqi)  _ 

-  SEN’S  (t'.Si-^Sj)  (that  is,  i  is  sent  to  C*Y  and  put  into  MTj):  also  TS  is  sent  from  to 
CN. 

-  At  CN:  _ 

-  if  TS  >  TS{lqc),  then: 

-  [./r5T(f ),  7'5']  is  inserted  into  NO. 

B)  Procedure  for  the  control  of  receiving  a  tuple  t  at  Sj'.  t  is  moved  from  MTj  at  CN  to 
DBj  at  Sj. 

Definitions: 

~  iQj  is  the  last  global  query  performed  at  Sj 

-  Iqc  is  the  last  global  query  performed  at  CN. 

Procedure: 

-  At  CN: 

-if  [KEi'(i).TS(lq^)]  is  in  NO.  then: 

-  delete  it  from  NO 
_^TS  ^  TS(Lqn) 

-  else;  TS  <-  TS{lqi.)  _ 

-  RFC (t:Sj)  (that  is,  t  is  sent  to  Sj  and  inserted  into  DBj)\  also  TS  is  sent  from  CN  to 

-  At  CN:  _ 

-  if  TS(lqj)  >  TS.  then: 

-  f  is  inserted  into  IDB 

-  \^KF.Y{i).j,TS.TS{lqi))^  is  inserted  into  YES 

-ktSj'.  _ 

-  if  TSilqj)  <  TS,  then: 

-  \KEY{t),TS ]  is  inserted  into  A'Oy. 

C)  Procedure  for  the  control  of  the  execution  of  a  global  query  Iq^,  with  Timestamp 
TS{Lqn),  at  the  station  5*1. 
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PrvvedarH: 

-  iQn  is  executed  on  the  following  set  of  data  at  S^: 

-  the  Iona]  database  J,DBi  (that  is,  DB^) 

-  minus  all  the  tuples  f,  belonging  to  DBi,  such  that  \_KEY is  in  NOi  {x 
represents  the  "don’t  care"  value) 

-  Delete  from  NO^  all  |  x,  TS (^7n)]’ 

D)  Procedure  for  the  control  of  the  execution  of  a  global  query  Iq,.,  with  Timestamp 
TS{Lqc)r  at  the  control  node  CN.  Note  that  at  this  point  Lq^  has  been  already  per¬ 
formed  at  all  stations  Si  {i  =  1,  ,  N). 

Procedure: 

-  Iqc  is  executed  on  the  following  set  of  data  at  CN'. 

-  all  the  mail  trays  MT-i  (i  -  1,  .  .  .  ,  N) 

-  minus  all  tuples  belonging  to  some  mail  tray  MTi,  such  that  \^KET{t),x'\ 
is  in  NO 

-  plus  all  tuples  t,  belonging  to  IDB,  such  that: 

TS(lq,)<TS(lq,)^TS(lge) 
where  [KBy'{t),x,TS  {lq\),TS  (Iqz)]  is  in  TiFS' 

-  Delete  from  NO  all  [x,  TS  (igc)] 

-  Delete  from  YES  all  [x, x.x,  r5' 


e,  GLOBAL  QUERY  PROCESSING  OPTIMIZATION  IN  AN  OIS  ENVIRONMENT 

Consider  the  processing  of  a  global  query  gq  with  Master  Station  Sj^.  We 
have  studied  how  to  correctly  execute  Iq,  the  set  of  projection  and  restriction  clauses 
specified  in  gq,  on  each  local  database  LDBy  This  involves  the  control  of  concurrency 
and  data  movement.  When  all  Inna!  redxjctions  Iq  (LDBj)  have  been  computed  and 
saved  at  each  Sj,  the  problem  is  to  collect  these  results  at  Sa.  If  gq  does  not  have  any 
join  clause,  we  can  send  each  lq{LDBj)  from  Sj  to  Sxt-  But  if  gq  does  have  one  or  more 
join  clauses,  we  could  reduce  the  various  lq{LDBj)  at  each  Sj  before  sending  them  to 
Sif. 

This  problem  is  important  because  we  know  that,  due  to  the  high  number  of 
stations,  the  total  DDB  may  be  very  large  even  if  each  individual  LDBj  is  not  very  large. 
So,  the  total  dimension  of  all  lq{LDBj)  can  be  rather  considerable.  It  is  quite  desider- 
able  to  reduce  each  Lq{LDBj)  before  sending  it,  in  order  to  save  time  and  to  avoid  com¬ 
munication  network  overload.  Moreover,  the  join  operations  can  highly  reduce  the  size 
of  the  local  results  from  the  previous  reduction  and  restriction  operations. 

Unfortunately,  Ihis  problem  is  very  difficult.  It  is  difficult  in  the  usual  dis¬ 
tributed  database  enNuronment  such  as  SDD-1,  and  it  is  even  mure  difTLculL  in  our  distri¬ 
buted  database  model,  in  vvhich  the  relations  are  so  highly  fragmented.  In  effect,  the 
problem  here  is  that  each  relation  of  the  global  distributed  database  DDB  is  parti¬ 
tioned  into  a  large  number  of  local  relations  Rj,  and  so  the  usual  techniques  of  global 
query  processing,  using  cost-beneficial  semi-joins  between  relations  [6],  are  quite 
ineffective  in  most  situations. 
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We  report  here  the  results  of  a  study  of  the  optimal  global  query  execution 
strategy  for  highly  fragmented  distributed  databases  [14],  as  is  the  case  with  our 
model.  Here  we  will  consider  the  execution  strategy  for  global  queries  with  only  an 
equality  join  (the  most  common  case).  The  results  are  extended  in  [14]  to  more  com¬ 
plex  global  queries  with  severad  join  clauses. 

For  our  probabilistic  model  we  assume  a  uniform  distribution  of  attributes 
values  for  each  relation  (both  inside  local  databaises  and  among  different  local  data¬ 
bases),  and  statistical  independence  of  different  attributes. 

We  have  the  global  query  gq: 

=/l  ] 

where  R  and  R  are  tjie  results  of  restrictions  and  projections,  specified  in  gg,  on  the 
relations  R^  and  R^.  R^  and  are  the  portion  of  R  ^  and  7?®  in  the  station  Sj.  Thus, 
we  have:  _  _ 

U  =  Ig(LDBj) 

_  The  problem  is  to  reduce,  wher^  cost-effective,  R/  and  R^^  at  each  using 

the  join  R^[A=A  before  gathering  all  Rj  and  R^  at  the  Master  Station  S'j/  and  com¬ 
puting  the  final  answer  to  gq,  that  is  gq{DDB). 


8.  1.  InforTTiation  Gathering 


We  rieed  some  information  about  the  reduced  relations  and  their 
local  portion^  and  and  the^ttribut^  A,  in  order  to  choose  the  best  strategy  for 
calculating  ^[^4  =.4  ]/2 We  call  R  and  R  the  reduction  of  R^  and  ^  caused  by  the 
join  operation  R  ’[n  =A  ].^. 

For  each  relation  R,  whether  local  or  global,  reduced  or  not,  and  for  each 
attribute  .4  appearing  in  it,  we  define: 

-  card(R):  number  of  distinct  tuples  in  R  (they  could  overlap  only  if  R  was 
previously  reduced); 

-  width  {R ):  number  of  bytes  per  tuple  in  R\ 

-  width  {A  );  number  of  bytes  in  attribute  field  A  (we  assume  that  A  has  the 
sa.me  v/idth  in  each  relation  in  which  it  appears); 

-  card  {R  [4  ]);  number  of  distinct  values  in  i?  [4  ]. 

This  allow  the  following  parameters  to  be  defined: 

-  Width  factor  between  the  relation  and  the  attribute  A: 

luj  =  widlh(^^)y-LLrldLh(A)  (u^i  ^1) 

VJ2  =  xuidth(R^)/’'wzdth(A)  (w2>  l) 
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-  Overlapping  factor  for  values  inside  each  local  relation: 

ij  =  card  card  {}^)  {l\  ^  l) 

i2  =  card  (/?/[/!  {Lz  ^  1) 

Note  that  the  value  l\  and  Iz  are  independent  from  the  particular  station  Si,  in  accor¬ 
dance  with  our  probabilistic  assumptions. 

-  Dimension  factor  between  two  relations:  _ 

z  =  card  {R^)y'card.  {R 

-  Keduction  factor  between  the  local  ^^lation  an_d  its  reduction  after  the  join  operation: 

ri  =  card  {R^ycard  {R  (ri  ^  l) 

r2  =  card  {R  )/'card  {R^")  (rg  ^  1) 

-  Fragmentation  factor  (for  each  station  Sj)  indicating  the  portion  of  the  relation  inside 

the  station:  __  _ 

7i{  =  card  card  {J^) 

n|  =  card  {Rj)/ card  {R^) 

Moreover  we  have  other  factors  which  are  occasionedly  computed  ofif-line 
and  are  known  to  all  stations  [14].  These  allow  us  to  compute  card  {dorm  {A)),  the  cardi¬ 
nality  of  the  domain  of  attribute  A  (number  of  distinct  values  of  the  attribute  A  in  ail 
relations  in  which  it  appears). 

For  the  processing  of  the  global  query  gq,  each  Slave  Station  Sj  must  send 
to  Sii  the  following  in^rmatiqn  (as  soon  as  Sj  has  executed  Iq  on  the  local  database 
LDBj  and  determined  Rj  and  R^)’.  _  _ 

card{]^)  ,  caTd{^\A]) 
cardiRj)  ,  card{Rf\A]) 

When  Su  has  this  information  from  each  station  Sj,  it  can  compute  u)x,  Wzt 
(for  all  j)  [14]. 


8.2.  Execution  Policies 


_  Having  this  information.  S^f  can  decide  the  best  strategy  for  the  processing 

of  R^\A  -A  ]/? h  We  consider  three  execution  policies. 

PI:  This  employs  the  simple  centralizing  technique:  all  data  Rj  and  Rf  (in  the  local 
reduction  lq{LDBj))  are  sent  from  all  Slave  Stations  Sj  (J^M)  to  the  Master  Sta¬ 
tion  Sjiif.  where  the  join  is  performed. 

P2:  This  employs  the  semi-join  techi^que,  m^order  to  reduce  each and  (in  each 

local  reduction  Lq  (LDBj))  into  Rj  ai^  Rj  .  _ 

To  reduce  Rj'  to  Rj  (Rf  to  eaci^j^tatioii  Si  must  send  Ri\A^  to  Sj  (/?<’[A]  to 

Sj).  Finally,  each  Sj  sends  Rj  and  Rj  to  S^. 

P3:  This  employs  the  technique  of  performmg  th^^oin,  on^^he  attribute  values  of  .4, 

inside  5‘j^then  reducing  each  R}  and  Rf  to  Rj  _and  Rj  .  before  collecting  them. 
First  all  Rj[A  ]  and  Rf[A]  are  collected  in  R^[A=-A)^  is  executed  on  them 
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and  they  are  reduced_to  Rj  [^_]  and  [A  ].  Every  ]  and_^,  \A  ]  is  sent  back 
from  Sj^^to  Sj^-here  r/  and  R/  are  so  reduced  to  Rj  and  Rf  .  Finally,  each  Sj 
sends  Rj  and  Rj  to  5'v- 


8.3,  Policy  Cost  ComparisoTi 


The  goal  in  our  query  optimization  study  is  to  collect  at  the  Master  Station 
Sjf  all  data  necessary  in  order  to  compute  gq{DDB)  mth  the  minimum  quantity  of 
inter-sitc  data  transfer.  Due  to  the  very  large  number  of  stations  connected  through 
the  communication  network,  we  consider  the  network  bandvridth  to  be  the  system 
bottleneck.  Thus,  we  assume  the  cost  of  a  particular  policy  to  be  simply  the  amount  of 
data  that  is  transferred.  Other  cost  factors,  such  as  local  computation  load,  distance 
effect  of  communications  in  a  particular  network  topology,  overhead  cost  for  message 
transmissions,  are  here  considered  as  secondary  and  are  not  taken  into  account. 

Let  Cl.  C'd.  C’3  be  the  cost  of  Pi.  P3. 


Comparison  between  PI  and  P2 


If  N  >  j(l-r2)(l-n^)-n^ +2  and  N  >  ~W2il-r2)(l-n‘Y)-nz  +2 

M 

then  C  1  <  C2  [14]. 


Wp 

.V  >  -^  +  2 


Simplifying,  if  N  >  —^  +  2  and  ... 

i  1  £3 

we  have  that  Cl  <  C2,  and  PI  is  better  than  P2 


w  I  W2 

Tn  a  real  situation  we  expect -  and  — —  to  be  in  the  range  1-20  and  N,  the 

1 1  I2 

totad  number  of  stations,  to  be  very  large  in  an  OIS  {N  >  100).  Thus,  the  centralization 
policy  is  globally  better  than  the  semi-join  policy. 


and 


,,  ^  wi-li 

If  ri  <  - 

w  I 

then  C3  <  Cl  [14], 


Simplifying,  if  ri  < 


Comparison  between  Pi  and  P3 
W2-I2 


7-2  < 


w^l 


2^2 


w^—1 


and 


7-2  < 


1 


ILij+l  IL‘2+1 

we  have  that  C3  <  Cl,  and  P3  is  better  than  PI. 


Usually  in  real  systems  w^  and  W2  are  greater  than  2,  and  a  likely  mean 
value  of  ri  and  r2  is  [13]-  Thus.  P3  is  globally  better  than  the  raw  centralizing 

technique. 
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CorrvparisoTi  between  P2  and  P3 

If  N  >  (l—rif)(2+r  1)4-1  and  N  >  (l— 7t‘^)(2+r2)+ 1 
then  C3  <  C2  [14]. 

Simplifying,  if  N  >  5  we  have  that  C3  <  CS,  and  P3  is  better  than  P2. 

This  condition  always  holds  in  a  real  OIS,  where  is  quite  large.  Thus,  P3  is 
globally  better  than  the  semi-join  teclinique. 


6.4.  The  Composition  Strategy 


From  the  previous  discussion  wejcan  see_that  P3  is  generally  the  best  exe¬ 
cution  policy  for  global  queries  of  the  type:  R  i[A  =A  ]/?2 

Yet,  it  is  not  true  that  it  is  the  optimal  strategy.  For  example,  we  know  that 
P2,  which  implies  the  use  of  semi-join  reductions  on  all  and  is  generally  worse 
than  PI  and  P3.  Neverthless  this  does  not  exclude  the  possibility  that_P2  could  be 
better  than  both  PI  and  P3  for  the  selective  reduction  of  same  Rj  and  R^.  Thus,  we 
can  estimate  an  optimal  strategy  if  we  choose  ca8B_by  case  the  best  policy,  among  Pi, 
P^  and  P3,  to  apply  individually  to  each  R/  and  Rj,  in  order  to  have  the  result  of 
in  Sji4  with  the  minimum  data  transfer  cost.  This  is  called  the  composition 

strategy. 

Let  Cl/,  Cl?  the  costs  of  PI  over  /?/,  R^.  In  the  same  manner  we  define 
C2/,  CZj  for  P2  and  C3f,  C3y  for  p3. 

Followdng  the  composition  strategy,  the  Mastej;^  Station_iS;i^  chooses  the  best 
policy  to  apply,  case  ^  case,  for  each  local  reduction  R/  and  /?/.  and  in  each  Slave 
Station  Sj.  For  R/  (  R^  ),  S^j  will  choose  the  policy  ensuring  the  minimum  estimated 
cost,  the  lowest  among  C  ij,  C2/  and  C3j  {  C  if.  C2f  and  C3f  ). 

We  have,  for  each  Sji 

-  local  comparison  of  Pi  and  P2  [14]: 

c  1/  >  c  2/ 

Clf  >  C2f 

-  local  comparison  of  PI  and  P3 

C  if  >  C  3}  iff 

Clf>C3f  iff 

Note  that  these  conditions  are  independent  of  the  particular  station  S,-  (that  is,  from 
and  ’ 


^1-^1 

^2~h. 

W2+12 


iff 

iff 


n{  >  z  - 
n{  > 


t2_  l-TZt 

■wi  1-ri 

1  1-ni 


Z  W; 


2 
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-  local  comparison  of  P2  and  P3 
C3/>C3/  iff  n\>z^ 

C2f>C^f  iff  7ii>—  — 

^  '  z  ig 


[14]:. 

l+’-i. 

1-ni 

l+rg 


These  formulas  tell  us  that  for  processing  Rj  (  Rf‘  ): 

-  P3  is  better  than  PI  in  every _case,  unless  the  vridth  of  attribute  A  is  almost  equal  to 

the  width  of  the  reduction/?  ^  {R^  ).  _  _ 

-  P2  is  better  than  the  better  between  Pi  and  P3  only  if  tha  size  of  R^  (  Rj  ).  in  number 
of  tuples,  represents  a  very  large  part  of  the  total  size  of  /?  *  (  R^  ). 

From  this  result  it  follows  that  the  semi-join  technique  (P2)  is  not  in  general 
useful  in  an  OIS  environment  (even  though  it  is  often  used  in  traditional  distributed 
database  systems  [6]).  In  most  cases,  Pl^appea^  to  be  the  best  policy  (that  is,  cen- 
trali^d  join  e-jrecntion  on  the  A  values  of  and  R^  and  then  reduction  of  the  local  Rj 
and  /?/). 


8.  5.  Multi-join  Clauses 


Now  suppose  we  must  process  a  global  query  gq  involving  more  than  two 
relations  (/?\  with  i  =  1,  2.  .  .  .  ,  T).  Assume  gg  contains  several  equality  joins  (inequal¬ 
ity  joins  are  treated  using  complete  centralization,  being  very  difficult  to  analyse  in  an 
013  environment)  of  the  type; 

R^^[A=A]R^^ 

As  above,  the  projections  and  restrictions  on  /?\  specified  in  gq,  are  per¬ 
formed  locally,  yelding  Rj.  Iq  is  executed  on  each  station  using  the  concurrency 
control  algorithms  of  Sec. 5  and  the  data  movement  control  algorithms  of  Sec. 7.  Thus, 
we  now  have  for  each  Sji  _  _  _ 

lq{LDBj)=R>  U'fl/U 

where  R  .  R  ,  ,  R  are  the  reductions  of  the  relations  in  DDB  appearing  in  gq. 

The  strategy  for  the  processing  of  a  multi-join  gq,  pre^nted  in[l4],  is 
based  on  the  results  found  in  the  study  of  the  optimal  strategy  for  R\A=^A^^.  Here 
too  the  goal  is  to  reduce,  when  cost-effective,  each  /?’’  to  a  f  Uithci  1  educed  R  .  This  iS 
done  as  many  times  as  possible  and  for  all  =A  clauses  contained  in  gq. 

Tliis  is  accomplished  step-b3’--step:  first  a ]/?^^  is  executed  and  the 
result,  called  R,  is  gathered  in  the  Master  Station  Then  R[A=A^R  ^  is  performed 
and  a  new  R  is  gathered  in  and  so  on  until  all  the  required  data  is  at  (in  the  final 

R). 


This  strategy,  unlike  the  one  for  the  simple  join  R^[A=A]R^,  is  suboptimal 
since  no  look-ahead  or  back-up  technique  is  used.  Yet,  this  strategy  can  be  considered 
acceptable  in  real  systems  since  multi-join  global  queries  are  rather  infrequent  and  the 
eventual  implementation  of  an  optimal  strategy  is  extremely  complex. 
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9.  IMPLEMENTATION 


The  concepts  discussed  in  this  paper  have  been  applied  in  the  implementa¬ 
tion  of  the  global  query  facilities  of  a  prototype  office  information  system:  OFS. 

OP'S  (Office  Form  System)  is  an  office  information  system  based  on  the  form 
concept  [15],  A  working  prototype  of  OFS  was  implemented  by  the  Computer  Systems 
Research  Group,  of  the  University  of  Toronto,  on  a  local  network  of  microcomputers. 

A  set  of  facilities  for  mainipulation  of  forms  and  their  content  is  present  in 
OFS.  Forms  can  be  created,  stored,  retrieved,  modified,  copied,  mailed,  traced  and 
located  by  office  workstations. 

OP'S  stores  and  manages  forms  as  tuples  in  the  relational  data  model.  In 
OFS  envu'onment  there  is  a  strict  correspondence  between  the  following  objects: 

form  type  and  relation 
fvrrri  ovcurrence  and  twple 
form  field  and  attribute 

They  are  the  same  objects,  considered  either  from  the  view  of  the  office  information 
system  or  from  the  view  of  the  database  management  system. 

The  data  on  a  form  can  be  processed  in  a  completely  integrated  manner 
through  data  base  commands.  The  OFS  database  access  language  is  form  oriented  with 
a  user  interface  similar  to  Query  by  Example  [16].  Inside  each  workstation,  the  local 
database  (in  our  model:  DB^)  is  handled  by  a  relational  database  management  system 
known  as  MRS  (Micro  Relational  System).  MRS  was  especially  designed  for  microcom¬ 
puter  systems  [17].  Office  procedures  based  on  forms  can  also  be  specified  in  OFS  [18]. 
These  procedures  are  triggered  automatically  and  perform  prespecified  actions  on 
forms. 


The  OFS  prototype,  with  global  query  processing  facilities,  has  been  imple¬ 
mented  upoxi  a  star  type  local  network  with  central  switch  [19].  The  control  node  and 
the  station  nodes  are  DEC  PDP-il/23  microcomputers  [20  j.  All  these  machines  run 
under  the  UNIX  operating  system  [21].  All  source  code  has  been  written  in  the  C  pro¬ 
gramming  language  [22]. 


The  distributed  database  in  OFS  is  characterized  by  the  special  properties 
we  presented  for  distributed  databases  in  general  Office  Information  Systems.  The  OFS 
distributed  database  is  organized  according  to  the  Centralized  Control  Model  (Sec. 5.1). 
Thus,  in  the  implementation  of  the  global  query  processing  in  OFS,  the  procedures  for 
Centralized  Concurrency  Control  (Sec. 5,1)  and  Centralized  Data  Movement  Control 
(Sec. 7. 3)  have  been  used.  For  a  global  quer)^  execution  strategy,  the  Centralizing  Pol- 
ic}?-  (Sec. 8. 2)  has  been  temporarily  chosen,  mainly  for  simplicity  of  implementation. 
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THE  DEATH  OF  THE  COMPUTER  CENTER 

D.  Tsichritzis 

Computer  Systems  Research  Group 
University  of  Toronto 


Feudal  castles 

Some  computer  centers  resemble  feudal  castles  with  rigid  operating  pro¬ 
cedures  which  makes  them  very  forbidding  for  computer  users  to  come  close 
and  get  anything  done.  It  is  getting  to  the  point  that  even  to  find  the  right  per¬ 
son  in  the  center  is  difficult,  let  alone  to  get  some  help.  If  a  user  has  a  small 
problem  he  is  on  his  own.  If  he  has  a  larger  problem  he  has  to  wait  for  months 
for  a  decision  whether  anybody  is  going  to  help  him. 

In  the  meantime  the  center  management  spends  its  time  lobying  at  high 
levels,  trymg  to  get  more  money  to  expand.  They  are  armed  with  statistics 
about  system  loads  and  computer  usage  which  they  interpret  as  usefulness  to 
the  organization  and  its  computer  users.  Their  argument  is  that  since  the  sys¬ 
tems  are  so  heavily  used,  it  implies  that  the  users  are  happy.  If  the  organization 
spends  more  money  they  can  provide  more  of  the  same  thing  and  the  users  wdll 
be  even  happier. 

The  computer  center  personnel  splits  into  three  categories.  The  systems 
programmers  who  do  not  want  to  be  bothered  by  outsiders.  They  want  to  be  left 
alone  to  babysit  the  system.  The  application  programmers  who  are  disoriented 
as  they  are  shifted  continuously  to  try  to  make  pressure  groups  happy.  Finally, 
there  is  an  increasing  number  of  persons  who  comprise  the  "organization”  who 
are  doing  accounting,  produce  newsletters,  do  training  etc.  who  do  not  care 
very  much.  They  have  found  a  niche  in  the  status  quo  and  are  happy  to  continue 
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being  part  of  the  bureaucracy.  They  are  accountable  only  to  the  castle  lords 
and  not  to  the  rest  of  the  organization.  y\fter  all,  the  computer  center  is  a  tight 
organization  which  grew  and  operates  as  a  separate  entity. 

This  situation  seems  great.  The  management  of  the  organization  thinks 
they  have  a  well  run  castle,  chiefly  because  they  get  their  input  from  the  com¬ 
puter  center  managers.  The  mainframe  manufacturers  are  happy  as  they  con¬ 
tinue  to  deal  with  people  with  similar  tastes.  The  computer  center  personnel  is 
happy  as  long  as  they  get  their  raises  and  if  not  they  pack  up  and  go  somewhere 
else.  All  is  nice  and  peaceful  in  the  lend,  except  for  the  appearance  of  the  chief¬ 
tains  and  the  bandits. 

The  chieftains  and  the  bandits 

The  chieftains  are  division  chiefs  of  the  organization  who  are  getting  fed  up 
with  long  delays  and  unsatisfactory  results.  They  need  information  system  sup¬ 
port,  they  need  it  now  and  in  a  form  which  they  can  use.  They  need  solutions  to 
their  problems,  they  do  not  want  computer  time,  or  application  programmers 
who  do  not  understand  their  situation.  The  chieftains  are  finding  the  money  and 
they  go  to  service  companies  or  even  set  their  own  small  computer  centers.  All 
this  is  dismissed  by  the  computer  center  management  as  nonsense.  The  service 
companies  are  more  expensive,  they  argue.  Yes,  but  they  are  also  more  respon¬ 
sive.  The  minicomputers  do  not  have  the  right  software  and  are  not  cost 
effective,  they  argue.  Yes.  but  some  minicomputers  give  you  a  very  good  bang 
for  your  buck  and  their  software  is  evolving  in  leaps  and  bounds.  As  a  matter  of 
fact,  since  minicomputers  take  advantage  of  the  latest  advances  of  both 
hardware  and  software,  they  are  getting  increasingly  competitive.  So  we  have 
appearing  in  the  land  small  fortifications,  which  are  fiercely  independent  from 
the  computer  center.  They  are  lean  and  mean  operations  which  are  v'cry  cost 
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effective  since  they  pay  everything  out  of  their  own  budget  in  real  dollars. 

And  then,  there  are  the  bandits.  The  bandits  are  individuals  who  get  fed  up 
with  all  organizational  computing  services.  They  airm  themselves  with  a  personal 
computer  and  learn  to  use  it  by  getting  advice  from  their  sons  and  daughters. 
Pretty  soon,  they  can  do  some  magnificent  things,  which  they  proudly  display  to 
their  colleagues.  Again  this  situation  is  dismissed  by  the  computer  center 
management..  Who  wants  Lo  use  a  microcomputer,  when  he  has  access  through  a 
terminal  to  a  large  machine.  Yes,  but  the  microcomputers  are  easier  to  use. 
They  do  not  have  good  peripherals  and  adequate  secondary  storage.  Yes,  but 
they  are  getting  them.  They  are  not  cost  effective.  Terminals  connected  to  a 
large  system  give  cheaper  computing.  Yes,  on  the  hardware  side.  Add  to  this  all 
the  personnel  costs  of  the  center,  and  maybe  you  come  up  with  another 
number.  Pius  microcomputers  are  fun  and  they  are  your  own. 

So,  all  is  not  peaceful  in  the  land.  The  computer  center  barricades  itself  in 
its  castle  and  fights  tooth  and  nail  with  the  chieftains  and  the  bandits.  In  the 
meantime  the  chieftains  and  the  bandits  get  increasingly  hostile  and  indepen¬ 
dent  as  they  are  being  prosecuted.  They  view  the  computer  center  as  an  enemy 
who  is  trying  to  suppress  or  discredit  their  efforts,  rather  than  the  helpful  ser¬ 
vice  it  is  supposed  to  be.  And,  as  there  is  friction  and  strife  in  the  land,  the 
organization  is  paying  a  high  cost. 

How  did  we  get  there 

Who’s  fault  it  is  and  how  did  -wc  get  there?  Frankly,  nobody’s  and 
everybody’s.  It  is  a  natural  development  from  historical  circumstances. 

Computers  used  to  be  very  expensive.  There  was  an  obvious  need  for  a  cen¬ 
tralized  approach,  where  the  computers  were  shared  by  the  users  to  minimize 
costs.  Partly  because  of  the  sharing  and  partly  because  we  did  not  have  the 
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right  tools,  the  operation  of  computers  was  very  difficult.  As  a  result,  a  vast 
army  of  priests  was  hired  to  be  around  them.  The  attitude  was  "bring  us  your 
problem  and  we  will  solve  it  on  the  computer".  Sometimes,  they  would  solve  a 
different  problem,  but  what  can  you  do.  This  body  of  priests  developed  some 
fairly  established  principles.  First,  the  users  do  not  know  what  they  want.  Yes. 
but  users  have  complex  requirements  which  are  not  eaisy  to  define.  Second, 
users  can  never  learn  to  use  the  computers.  Yes,  but  it  was  due  to  the  terrible 
user  interfaces  that  computer  systems  used  to  have.  Third,  computers  are 
expensive  tools.  They  can  be  wasted  by  the  initiated,  but  not  by  the  common 
folks.  Fourth,  systems  which  are  very  difficult  to  use  are  not  all  that  bad.  Aft- 
erail,  we  got  to  learn  them  and  it  is  fun  to  know  something  which  nobody  else 
knows  or  can  easily  learn.  This,  in  essence,  keeps  the  priesthood  appart  from 
the  folks. 

The  priests  evolved  in  three  different  ways.  Some  of  them,  the  idealistic 
intellectuals,  became  missionaries  understanding  new  technologies  like  net¬ 
works  and  microprocessors.  They  moved  laterally  in  the  priesthood  and  became 
the  movers  and  shakers  of  small  high  technology  companies.  Some  of  the  pri¬ 
ests,  the  aggressive  and  pragmatic,  moved  up  to  become  bishops.  They 
comprise  the  management  of  the  centers  and  they  do  not  have  much  time  to 
understand  new  ideas.  They  perceive  their  job  as  bishops  managing  priests  and 
not  theologians  redefining  the  faith.  Finally,  some  of  the  priests  remained  rank 
and  file  priests.  They  are  too  lazy  to  become  missionaries  and  too  scared  to 
become  bishops. 

It  is  an  intellectualy  comfortable  environment.  The  priests  and  bishops  of 
the  user  companies  are  getting  along  fine  with  the  priests  and  bishops  of  the 
mainframe  manufacturers.  They  have  a  common  faith.  This  way  the  center 
management  is  getting  very  comfortable  projections  of  things  to  come  by  main- 
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frame  manufacturers.  In  the  other  direction,  the  manufacturer  representatives 
get  very  rational  projections  of  needs.  If  only  these  missionaries,  bandits,  chief¬ 
tains  and  all  these  smart  alecks  and  outlaws  did  not  rock  the  boat. 

What  can  be  done 

It  is  about  time  that  computer  center  managers  organize  their  forces  and 
come  out  of  the  castle.  They  have  to  provide  the  new  ideas,  the  leading  edge,  if 
they  have  any  hope  of  controlling  the  situation. 

The  chieftains  sorely  need  computer  expertise  as  they  evaluate  alternatives 
for  their  systems.  Computer  centers  ought  to  help  them  choose  the  right  sys¬ 
tem,  rather  then  trying  to  stop  them.  There  is  also  a  tremendous  need  for 
operational  expertise  in  setting  up  and  running  the  small  minicomputer  centeza. 
The  computer  centers  can  help  the  chieftains  set  up  their  systems  and  maybe 
do  facilities  management  for  them.  By  that  I  do  not  mean  take  them  over.  Leave 
them  their  independence,  but  give  them  a  helping  hand.  They  will  be  chieftains 
but  they  will  be  allies  at  least. 

The  bandits  are  being  bombarded  by  arms  dealers.  Word  processing 
manufacturers,  microcomputer  manufacturers,  office  equipment  manufactur¬ 
ers,  all  compete  telling  the  bandits  what  equipment  they  ought  to  use.  The  com¬ 
puter  center  can  provide  a  very  useful  service  by  advising  users  what  personal 
computers  they  should  get  and  how  to  hook  them,  if  need  be,  to  the  larger  sys¬ 
tems.  Maybe  they  ceui  buy  personal  computers  in  bulk  and  resell  them  to  the 
individual  users.  They  can  also  set  up  a  consulting  service  giving  advice  on  pro¬ 
gramming  packages.  Finally,  they  can  set  up  a  stockroom  of  spare  parts  and 
hire  some  techmeians  to  do  hardware  maintenance  for  the  machines.  This  way 
the  bandits  would  become  scouts.  Fiercely  independent,  but  believing  in  a  com¬ 
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The  death  of  the  center 


Does  this  mean  a  demise  of  the  castle?  Of  course  not.  Only  the  demise  of 
the  castle  mentality.  Computer  centers  are  absolutely  needed  for  running  the 
large  institutional  applications  that  every  organization  has.  They  can  provide 
large  storage  facilities  and  Fancy  peripherals.  They  are  also  indispensable  as 
centers  of  expertise.  However,  they  should  change  their  view  of  the  world  and 
their  attitude  towards  users.  They  cannot  aflort  to  antagonize  the  users  trying 
desperately  lo  solve  their  problems.  They  should  help  them  and  maybe  even 
adopt  some  of  their  ideas.  Computer  centers  do  not  have  a  monopoly  on  think¬ 
ing  computer  oriented  solutions. 

The  death  of  the  computer  center?  Not  yet,  not  for  a  long  time.  The  death 
of  some  computer  center  managers?  Possibl3^  if  they  are  not  careful,  and  can¬ 
not  control  the  revolt  in  the  land.  The  demise  of  priests?  Of  course  not.  there  is 
a  lot  of  room  for  everybody  in  a  growing  religion. 
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